Web Archiving

Purpose: This space documents web archiving best practices, workflows, and procedures used to build the NYARC web archive collection. It is a reference for internal staff and other institutions interested in web archiving. Please note that this documentation is a work in progress. Questions about the web archiving program or these guidelines may be directed to Sumitra Duncan, Head, Web Archiving Program.

Background: In 2013, NYARC received a grant, called "Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art Resources," from The Andrew W. Mellon Foundation to initiate a program of web archiving. The two-year program followed an earlier pilot study which demonstrated that the types of materials the NYARC libraries had been collecting in printed form were increasingly migrating to versions made available exclusively on the web. It concluded that there was an urgent need to document the dynamic web-based versions of auction catalogs, catalogues raisonnés, and scholarly research projects, as well as artist, gallery, and art dealer websites, because otherwise there is a real and imminent danger of a “digital black hole” in the art historical record.

NYARC’s objectives for the implementation of a web archiving program were to capture, make accessible, and preserve two terabytes of significant art-rich websites as WARC files. The term “web archiving” refers to the capture of born-digital materials as they appear on the live web and storing that capture in a standard WARC file format. WARC files, or Web ARChive file format, combine multiple digital resources into an aggregate archived file with related information or metadata. NYARC continues to build web archive collections, to provide access to these materials to our researchers (both the content and functionality), and is committed to preserving web archived materials over time.

Systems: NYARC is working with several vendor partners, with the primary partner being the Internet Archive's Archive-It service. We are also working with Webrecorder for on-demand archiving of content that is particularly technically challenging to harvest. We have integrated the cloud-based preservation service DuraCloud into our Archive-It account and our collections are backed up at regular automated intervals.

Quick Links