2. Assess capture completeness and render quality through web browsing

Follow the directions below to evaluate the extent to which any crawl has captured significantly valuable web content, to measure the accuracy of its behavior and appearance in an archival environment, and to recommend the necessary improvements to either/both:

Access web instance in both Wayback and Live Version


An archived web instance will be available to browse approximately 24 hours after its crawl completes. Once this time has passed, you may access the archived instance:

  1. Select the relevant collection from the “Home” or “Collections” page in the Archive-It interface

  2. Click on the appropriate seed URL to access a page of indexed captures

  3. Click on the index link that matches the completion date of the crawl that you wish to assess

For purposes of comparison, it is important to also access the web instance in its live form (if still in existence) and compare the live page with the archived instance.

To access the live version of the instance under review, simply copy and paste the string (beginning: https://) that follows the identifying information in your starting URL:


Navigate and inventory issues of quality


Once you have reached the home or landing page of the seed most central to the web instance under evaluation, you may begin to assess the quality of its capture, behavior, and appearance.



Enable Wayback QA

Begin by clicking on the Enable QA link within the Archive-It banner that appears at the top of the current archived web view:

Clicking on this link adds a View Missing URLs link to the same header, through which you may view the specific URLs in each page’s archived form that must ultimately be patch crawled in order to complete the instance’s capture:

You may observe the number of links automatically summarized within this link text for each page, but refrain from initiating any actual patch crawls of the missing content until you have completed a more systematic review of the entire archived instance through the Wayback interface. (Each missing URL is progressively added to a summary list of missing URLs from your collection that will be patch crawled later).

Activate Missing URLs


Frequently, the View Missing URLs link described above will enumerate and link the user to less missing content than there is indeed left to be patched into a fuller capture of each given page. When, for instance, content from the live view is visibly missing from your archived view of the web instance, but no Missing URLs detected by the Wayback interface in fact correspond the live content in question, follow these guidelines to ensure that these and all other possible Missing URLs are in fact activated and therefore can be detected as either present or missing through the Wayback interface. More generally, however, it is necessary to follow all links in view, and especially those that lead to unarchived content (pages that display the “Not in Archive” title) in order to ensure that this content is queued for patch crawling.



Compare live instance to Wayback rendition


To ensure a systematic review of the captured web instance, locate the seed site’s main navigational tool and/or site map. Identify and navigate to and through web standard pages with few or no “child” pages, such as About, Contact, and other general informational pages. Compare the content, the functionality, and the general “look and feel” of this page to its present iteration on the live web and make note of any differences between the two.

Once you have reviewed the above pages, you may move on to those thematically significant pages that lead to further and/or content-rich “child” pages, such as Catalogs, Collections, Exhibitions, Publications, etc. Again compare each archived view to its live analog, keeping track of any differences in content, functionality, and/or style rendered between them. You may observe the missing URLs summarized in the View Missing URLs header link as you go, but make certain to follow all links in view in order to ensure that all are in fact active and may therefore be detected by the Wayback interface as either archived or missing. This process is essential to ensuring that missing URLs are queued for later patch crawling.

Live

Archived

This comparison of live and archived views of the Mitchell-Innes & Nash Gallery website suggests that scope adjustments or patch crawls would enhance the current capture with missing content.


When you have completed the above, proceed to:

3. Improve capture, behavior, and appearance