||Library of Congress Web Archive Minerva
The Online Library of Liberty website has been selected by the Library
of Congress to be archived as part of their Minerva
project to collect and preserve material "of historical importance
to the Congress and to the American people." The OLL website will
join a select group of websites which will be cataloged and preserved
for future generations of researchers. It will be part
of the "Single Sites" collection of thematic or event-based
web sites - presumably our theme of "liberty" is what
caught their attention.
This is the email we received outlining some of the aims of the LOC Web Archiving
To Whom It May Concern:
The United States Library of Congress has selected your Web site for inclusion
in its historic collections of Internet materials. The Library's traditional
functions, acquiring, cataloging, preserving and serving collection materials
of historical importance to the Congress and to the American people to foster
education and scholarship, extend to digital materials, including Web sites.
The following URL has been selected for archiving:
We request your permission to collect your web site and add it to the Library's
research collections. In order to properly archive this URL, and potentially
other URLs of interest on your site, we would appreciate your permission to
archive both this URL and other portions of your site. With your permission,
the Library of Congress or its agent will engage in the collection of content
from your Web site at regular intervals over time and make this collection
available to researchers both onsite at Library facilities and though the Library's
public Web site http://www.loc.gov/webarchiving/.
By special arrangement, the Library may also make this collection available
to scholarly research institutions for web archive research. The Library
hopes that you share its vision of preserving Internet materials and permitting
researchers from across the world to access them...
Our Web Archives are important because they contribute to the historical
record, capturing information that could otherwise be lost. With the growing
role of the Web as an influential medium, records of historic events could
be considered incomplete without materials that were "born digital" and
never printed on paper. For more information about these Web Archive collections,
please visit our Web site (http://www.loc.gov/webarchiving/).
Additional information about the Minerva Project can be found here:
The Library of Congress Web Archives (LCWA) is composed of collections of
archived web sites selected by subject specialists to represent web-based information
on a designated topic. It is part of a continuing effort by the Library to
evaluate, select, collect, catalog, provide access to, and preserve digital
materials for future generations of researchers. The early development project
for Web archives was called MINERVA.
Web Archives Available:
- Crisis in Darfur, Sudan, Web Archive, 2006
- Iraq War 2003 Web Archive
- Law Library Legal Blawgs Web Archive
- Library of Congress Manuscript Division Archive of Organizational Web
- Papal Transition 2005 Web Archive
- September 11, 2001 Web Archive
- Single Sites Web Archive
- United States 107th Congress Web Archive
- United States 108th Congress Web Archive
- United States Election 2000 Web Archive
- United States Election 2002 Web Archive
- United States Election 2004 Web Archive
- United States Election 2006 Web Archive
- Visual Image Web Sites Archive
On the LOC's policy on web archiving see this
The Library of Congress preserves the nation's cultural artifacts and provides
enduring access to them. The Library's traditional functions of acquiring,
cataloging, preserving and serving collection materials of historical importance
to the Congress and the American people to foster education and scholarship
extend to digital materials, including Web sites.
In 2000, the Library of
Congress established a pilot project to collect and preserve these primary
source materials. A multidisciplinary team of Library staff representing
cataloging, legal, public services, and technology services studied methods
to evaluate, select, collect, catalog, provide access to, and preserve these
materials for future generations of researchers. The Library developed thematic
Web archives on such topics as the United States National Elections of 2000,
2002, and 2004, the Iraq War, and the events of September 11. More about
these collections plus many other available collections can be found at the
Library of Congress Web Archives Web site.
In July 2003, the Library and the national libraries of Australia, Canada,
Denmark, Finland, France, Iceland, Italy, Norway, Sweden, the British Library
(UK), and the Internet Archive (USA) acknowledged the importance of international
collaboration for preserving Internet content for future generations and
formed the International Internet Preservation Consortium. The goals of the
Consortium include collecting a rich body of Internet content from around
the world and fostering the development and use of common tools, techniques
and standards that enable the creation of international archives.
In 2004, the Library’s Office of Strategic Initiatives created a Web Archiving
team to support the goal of managing and sustaining at-risk digital content.
The team is charged with building a Library-wide understanding and technical
infrastructure for capturing Web content. The team, in collaboration with
a variety of Library staff, and national and international partners, is identifying
policy issues, establishing best practices and building tools to collect
and preserve Web content.
The team has completed several Web archive collections and continues to work
on new projects for building Web archives.
The Minerva project is currently working on 8 projects, one of which is called
the "Single Sites Web Archive". More information can be found here:
Scope: The Single Sites Web Archive contains sites covering a diverse array
of topics selected by recommending librarians from the Library of Congress.
This growing archive currently focuses on military history (Civil War, World
War II, etc.) and African-American history and culture. Other topics currently
include numismatics, Hungary, immigration, charitable organizations, and nanotechnology.
Included in the web archive are blogs, individual web pages, educational sites
(including virtual exhibitions), and organizational sites.
This collection is part of a continuing effort by the Library of Congress
to evaluate, select, collect, catalog, provide access to, and preserve digital
materials for future generations of researchers.
There is a FAQ section which nexplains a few more details about the project
About Web Archiving Activities at the Library of Congress
1. Why is the Library of Congress collecting and creating an archive of
The Library of Congress and libraries and archives around the world are interested
in collecting and preserving the Web because an ever-increasing amount of the
world’s cultural and intellectual output is created in digital formats and
does not exist in any physical form. Creating an archive of Web sites supports
the goals of the Library’s Digital Strategic Plan, announced in March 2003,
which focuses on the collection and management of digital content...
3. How large is the Library’s archive?
As of February 2010, the Library has collected almost 160 terabytes of data.
4. What kinds of Web sites does the Library archive?
Library of Congress recommending officers, or curators, select a variety
of Web sites to archive, depending on the theme of the collection activity.
The Library’s MINERVA project was the initial pilot project to archive web
sites. Event-based or thematic collections publicly available through the
Library of Congress Web Archives Web site include Election 2002, September
11, Election 2004, and the 107th Congress Web archive.
Categories of sites archived include, but are not limited to: United States
government (federal, state, district, local), foreign government, candidates
for political office, political commentary, political party, media, religious
organizations, support groups, tributes and memorials, advocacy groups, educational
and research institutions, creative expressions (cartoons, poetry, etc.), and
How Web Archiving Works
3. How much of a Web site is collected?
The Library’s goal is to create an archival copy – essentially a snapshot
-- of how the site appeared at a particular point in time. Depending on the
collection, the Library archives as much of the site as possible, including
html pages, images, flash, PDFs, audio, and video files, to provide context
for future researchers. The Heritrix crawler is currently unable to archive
streaming media, "deep web" or database content requiring user
input, and content requiring payment or a subscription for access. In addition,
there will always be some Web sites that take advantage of emerging or unusual
technologies that the crawler cannot anticipate.
4. Do you archive all identifying site documentation, including URL, trademark,
copyright statement, ownership, publication date, etc.?
We attempt to completely reproduce a site for archival purposes.