3. Improve capture, behavior, and appearance
Adjust crawl scope
Patch crawl missing content
After you have identified opportunities to improve future crawls of the applicable seed(s) with scope adjustments, you may proceed to design and initiate a patch crawl of select missing content. (As a general rule, NYARC QA technicians do not patch crawl missing content on a page-by-page basis as is enabled by the View Missing URLs link on each captured webpage’s Wayback view, rather on a full crawl basis as is enabled by the View Missing URLs from Wayback QA link on each collection’s management page in the Archive-It software service interface).
View missing URLs from Wayback QA
To begin, navigate to the relevant collection’s management page in Archive-It and click on the Wayback QA link:
The list may be organized by clicking on the respective column header text to display items in ascending alphabetical order by the precise text of the missing URL, that of the source page from which it is missing, the possible reason that it is missing from the current capture, the general type of file that the URL represents, or the expected size of that file. Such organization is essential in order to efficiently manage problematic lists of missing URLs, which may be complicated by their volume and/or complexity.
Run Patch Crawl
To pursue missing content, select all relevant missing URLs, by clicking on the checkbox next to each URL, then clicking on the Patch Crawl Selected button in the left corner above the list:
Before running your patch crawl, Archive-It will prompt you to confirm your selection based on the reason they may be missing from the current capture. Documents “Blocked by Robots.txt” require that you first click on the capture documents blocked by robots.txt checkbox before clicking on the Run Patch Crawl button.
Clicking on the Run Patch Crawl button will queue your patch crawl to run in the order that it was initiated among NYARC’s other one-time, scheduled, and test crawls. You may abort or check on the status of your patch crawl at any time thereafter by hovering over the Crawls link in the Archive-It interface’s top navigational menu, then clicking on the Current Crawls link:
Running a patch crawl thusly will remove the selected missing URLs from view in your collection-wide list, so you may now return to the list by way of the Wayback QA link in order to patch crawl missing content from the list’s successive pages.
Evaluate and document patch crawl results
The necessity for and initiation of any patch crawl(s) must ultimately be reported, with a brief description of the missing content pursued through this strategy, on the QA report form.
Submit issues to Archive-It
Issues of quality beyond those that can be mitigated through scope adjustment and/or patch crawling may require intervention by Archive-It crawl engineers and/or Wayback developers. To report these issues, click on the Help Center link in the Archive-It Interface Header, then select Submit a Request on the upper-right side of the Help Center page.
These help requests are coordinated among partners, engineers, and developers by Archive-It partner specialists. To make these frequently technical and occasionally prolonged interactions as effective and efficient as possible, follow these communication principles:
Cite precedent: Check existing NYARC help tickets (under the NYARC tab in the help interface) for similar issues before submitting any brand new request. Even when these prior interactions do not provide an immediate mitigation strategy, they may provide the partner specialist with vital information about parallel efforts and progress made by their colleagues. Indicating which other seeds may have manifested a similar problem, or which specialist may have recently solved a seemingly identical one, for instance, will greatly improve their efficiency.
Inquire broadly: When requesting help, always be sure to ask your partner specialist (as politely and briefly as possible) to indicate the likeliest causes of your problem; offer informed theories of your own when/if you have them, but always let them know that you are engaged in the problem mitigation yourself. More than any quick fix, information regarding the source of your given issue can help NYARC to anticipate future QA issues and processing needs, and will subsequently reduce the load on Archive-It's partner specialists and engineers.
Check Proxy mode: When experiencing any problem with playback (ie. issues other than accessing and/or crawling a desired file path), remember to first compare your view of the archived resource in Wayback mode to a view in Proxy mode. Always be certain to report that you have performed this step in your initial help request, include any further questions that this comparison raises, and, insofar as is feasible, provide screenshots of each mode.
Ignore robots: When experiencing an access-related issue--any problem related to crawling a desired file path, rather than to playing it back--remember first to ignore the Robots.txt protocol. To avoid delays, be certain to report explicitly in your initial help request that you have already done so.
Share screenshots: For myriad reasons, Archive-It employees and contractors frequently do not see the same issues that we see in New York manifest at their stations in San Francisco or elsewhere. For this reason, it is critically important whenever feasible to provide them with screenshots. To do so on a Windows PC, simply navigate to the view most representative of the issue that you want to resolve, tap the "Print Scrn" button on your keyboard, open a graphic editing software (all NYARC PC's should at least have MS Paint), and paste the view from your clipboard onto the canvas, where you may crop/resize and save it as a PNG file on your desktop. [When using a Mac, instead of Print Scrn hold Command-Shift-4 and tap the spacebar to select a window of which to take a screenshot]. There is no standard naming convention for these files, but good practice is to at least include an indication of the seed name and viewing mode--Live, Proxy, or Wayback. Include screenshots of any/all modes that manifest differences.
Specify software: It is important for the specialists and engineers to evaluate your issue(s) in the same context in which you encounter them, or else they may miss vital information. When encountering any rendering or otherwise browsing-related problem, be sure to specify in your initial help request precisely which browser(s) (ie. Chrome, Firefox, etc.) you have used and which operating system (ie. Windows 11, Mac OS Big Sur, etc.) your computer runs.
When you have completed the necessary steps above, proceed to: