Skip to Main Content

Web Archives: Home

Purpose of this guide

This guide is intended for students and researchers seeking support in using web archives.

Use this guide to find out about UK and international web archive resources including the Bodleian Libraries Web Archive and the UK Web Archive.

 

Bodleian Libraries' Web Archive

Bodleian Libraries Web Archive

Maintained by: Special Collections, Bodleian Libraries 
Content: The Bodleian Libraries’ Web Archive was created in 2011 to collect and preserve web material for posterity. As of 2018, the archive collects web material relating to the following eight public collections:  

  • Arts and Humanities 
  • International
  • Science, medicine and technology
  • Social Sciences
  • University of Oxford
  • University of Oxford Colleges
  • University of Oxford Student Societies
  • University of Oxford museums, libraries and archives

Scope: The Bodleian Libraries’ Web Archive works within the remit of the Bodleian Libraries’ Special Collections Collection Development Policy to collect web materials which relate to the University of Oxford or the collections of the Bodleian Libraries. As the Bodleian Libraries’ Web Archive works on a permission basis, permission is required from the owner before the web materials are added to the archive. Members of the public can nominate websites for inclusion in the BLWA, using our nomination form.   

Access: Search and access is freely available online. Browse the Bodleian Libraries' Web Archive. Nominate websites for inclusion in the BLWA.

UK Web Archive

UK Web Archive Homepage

Maintained by: UK Legal Deposit Libraries (The British Library, The National Libraries of Scotland and Wales, the Bodleian Libraries, Cambridge University Libraries, Trinity College, Dublin).

Content: The UK Web Archive collects millions of websites each year. As of 2017 approximately 500TB of data has been collected. An annual UK domain crawl aims to capture as much of the UK's (freely available) web presence as possible. Selected websites are crawled on a more frequent basis, such as some news sites which are collected daily. The UK Web Archive also curates collections of websites relating to topics and themes such as:

Scope: UK websites only.

Access: Most content in the UK Web Archive is collected as per the regulations of the Legal Deposit Libraries Act (2003) which was extended in 2013 to permit the Legal Deposit Libraries to collect UK based websites. Websites collected under the regulations are only available to view on Library premises unless permission has been received from the website publisher to make the content more widely available. All content in the UK Web Archive, however, can be searched online from anywhere.

Other web archives

A list of webarchiving initiatives is maintained at Wikipedia. A sample of these is listed below.

UK

UK Government Web Archive

UK Government Web Archive began its web archiving initiative in 1996. It aims to capture, preserve, and make accessible UK central government information published on the web.

UK Parliament Web Archive

The Parliament Web Archive began in its web archiving programme in 2009. It aims to capture, preserve and make accesible UK Parliament information published on the web.

European

Internet Memory Foundation's Web Archive

The Internet Memory Foundation (formerly known as the European Archive) is a non-profit organisation, based in Paris and Amsterdam. Since 2004, it has worked with cultural and government organisations to capture and preserve web archives. The collections of some of the organisations they work with are hosted on their website, including those of the UK Government and the UK Parliament.

Icelandic Web Archive

The Icelandic Web Archive has collected snapshots of Icelandic websites since 2004 in accordance with the Icelandic law on legal deposit from 2002. The collection is limited to .is-domains and a hand-picked selection of Icelandic websites within other top-level domains.

Bibliothèque Nationale de France

The Bibliothèque nationale de France has collected websites under French legal deposit legislation sice 2006. The collection is only available on BNF premises.

netarchive.dk

Under Danish Legal Deposit Law, the two legal deposit libraries in Denmark, State and University Library and The Royal Library, have been archiving Danish websites since 2005. The archive is only accessible to researchers who have requested and been granted special permission to use the collection for specific research purposes.

Asia

The Japanese Web Archiving Research Project

The Japanese Web Archiving Research Project has been archiving websites since 2002. The National Diet Library Law, revised in 2009, allows the National Diet Library to archive Japanese official institutions’ websites. Websites of cultural and international events held in Japan, and those related to electronic magazines, are also archived based on the permission of their webmasters. Parts of the collection are available online.

North American

Library of Congress Web Archives

Collections of archived web sites selected by subject specialists. Includes the subjects September 11, 2011; United States Election 2008; Iraq War 2003.

Government of Canada Web Archive

Since 2005 Library and Archives Canada (LAC) has collected a representative sample of Canadian websites. The collection contains over 170 million digital objects and more than 7 terabytes of data.

Cross-search webarchives

Memento logo

The Mementos service provided by the British Library provides a useful mechanism for searching several webarchives together.

Internet Archive

Wayback machine logo

Maintained by: Internet Archive

Content: Harvested websites collected since the establishment of the Internet Archive in 1996. Over 364 billion webpages included so far.

Scope: international. Aims to collect as much as possible.

Access: search and access is freely availble online via the wayback machine.

Web Archive Use Cases

Web archives equip researchers with a new and largely untapped resource from which to conduct research. They provide unique cultural insights into online and offline societies and can show how websites have transformed over time. Publications such as The Web As History (2017) showcase that the value of web archives are gaining an increased appreciation and awareness.

BUDDAH Project

Big UK Domain Data for the Arts and Humanities (BUDDAH) was a collaborative project led by the British Library, the Institute for Historical Research (University of London), and the Oxford Internet Institute (University of Oxford).

  • It aimed to ‘facilitate the use of a 65 terabyte dataset containing crawls of the .uk domain from 1996 to 2013. ¹ 
  • ‘As part of the project, ten arts and humanities researchers were invited to use this web archive dataset to conduct cutting-edge research’.¹  

A few examples of the research projects conducted using the BUDDAH project’s dataset are: 

1. Cowls, Josh. “Cultures of the UK.” The Web as History: Using Web Archives to Understand the Past and the Present, edited by Brügger Niels and Ralph Schroeder, UCL Press, 2017, p. 220.

Contact us

Citations

How to cite a Web Archive

Note: The recommended citations are based on the Bodleian’s preferred form for citing special collections material/archives. If desirable, date accessed and date archived could be added but this is not a standard across citation guides. The following format would allow flexibility for the users to add any necessary amendments depending on the citation/reference style they are using.

When using the web archive in your work it should be cited as would any other resource using the preferred forms of citation below.

Citing a collection (BLWA)

  • Format: Oxford, Bodleian Libraries Web Archive, [Collection Title], [URL], <accessed [date]>
  • Example: Oxford, Bodleian Libraries Web Archive, University of Oxford Colleges, https://archive-it.org/collections/4406, <accessed 08/01/2018>

Individual seeds or web pages (BLWA)

Other citation / reference style examples

1. MLA Format

Webpage cited as normal adding (in < and >)the name of the archive (italicized) and the archive URL.

  • McDonald, R. C. "Basic Canary Care." Robirda Online. 12 Sept. 2004. Web. 25 Nov. 2014. <http://www.robirda.com/cancare.html>. Internet Archive. <http:// web.archive.org/web/20041009202820/http://www.robirda.com/cancare.html>
2. APA Format
  • McDonald, R. C. (2004, September 12). Basic Canary Care. Retrieved from http://web.archive.org/web/20041009202820/http://www.robirda.com/cancare.html
3. Indiana University Web Archives on Archive-It
3.1 Citing a collection
  • [Collection Title]. Archived by the Indiana University Libraries Web Archive at http://www.archive-it.org/collections/219  <accessed [date]>
3.2 Individual seeds or web pages
  • Please cite individual seeds or web pages as follows: “School of Education.” [Collection Title].  Archived by the Indiana University Libraries Web Archive at  http://www.archive-it.org/collections/219  <accessed [date]>
4. Harvard (Sheffield Hallam University)
  • Author of page, year page updated, title of page, [online], last updated/posted day and month (if available), Archived at: URL BECTA (2010). Schools. [online]. Archived at: http://webarchive.nationalarchives.gov.uk/201101301115 10/http://schools.becta.org.uk/