Skip to main content

Custom Crawl Services

Internet Archive

Large-scale web harvests and national domain crawls performed for National Libraries, National Archives, preservation partners, research initiatives, and as part of special projects and custom crawling and research services.



rss RSS

176,534
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
National Library of Australia Crawls
National Library of Australia Crawls
collection
52,586
ITEMS
578.1M
VIEWS
collection

eye 578.1M

Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
Internet Archive Research Publication Crawls
Internet Archive Research Publication Crawls
collection
21,257
ITEMS
134M
VIEWS
by Internet Archive Web Group
collection

eye 134M

A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at  https://archive.org/details/ia_biblio_metadata
web_domain_tests
web_domain_tests
collection
33,929
ITEMS
121.5M
VIEWS
collection

eye 121.5M

WARCs from internal crawl testing.
Topics: web, cctld
National Archives and Records Administration
National Archives and Records Administration
collection
13,951
ITEMS
138.5M
VIEWS
collection

eye 138.5M

National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
National Library of Spain Crawls
National Library of Spain Crawls
collection
6,742
ITEMS
296.1M
VIEWS
collection

eye 296.1M

Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
Elections Web
Elections Web
collection
1,614
ITEMS
207.5M
VIEWS
collection

eye 207.5M

This collection contains collaborative Election crawls performed by IA.
Topics: elections, web
Election Crawl 2012
Election Crawl 2012
collection
1,613
ITEMS
207.5M
VIEWS
collection

eye 207.5M

This crawl was performed in Summer & Fall of 2012 to archive the US Federal Elections.
Topics: US, federal, elections, web, 2012
NARA 117th Congressional Crawl
NARA 117th Congressional Crawl
collection
1,861
ITEMS
2.3M
VIEWS
collection

eye 2.3M

This crawl of online resources of the 117th US Congress was performed by Internet archive on behalf of the United States National Archives & Records
Topic: crawldata
Bibliotheque Nationale de France Domain Crawls
Bibliotheque Nationale de France Domain Crawls
collection
1,653
ITEMS
203.1M
VIEWS
collection

eye 203.1M

Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
NLA 2022 Autumn Domain Crawl
NLA 2022 Autumn Domain Crawl
collection
2,089
ITEMS
2.3M
VIEWS
collection

eye 2.3M

Domain harvest of the Australian web domain (.au) performed by Internet Archive on behalf of the National Library of Australia in October-November, 2022.
Topic: crawldata
National Library of Luxembourg
National Library of Luxembourg
collection
16,213
ITEMS
74.3M
VIEWS
collection

eye 74.3M

National Library of Luxembourg
Topic: Luxembourg
National Library of Israel
National Library of Israel
collection
7,880
ITEMS
73.2M
VIEWS
collection

eye 73.2M

Data collected by Internet Archive on behalf of the National Library of Israel.  This data is currently not publicly accessible.
Topic: nlil
National Library of Australia Crawl
collection
4,658
ITEMS
125.6M
VIEWS
collection

eye 125.6M

National Library of Austrailia crawl. This data is currently not publicly accessible.
bnf_2008
collection
715
ITEMS
101M
VIEWS
collection

eye 101M

this data is currently not publicly accessible.
nls_2009
collection
874
ITEMS
72M
VIEWS
collection

eye 72M

this data is currently not publicly accessible.
Olympics Web
Olympics Web
collection
2,066
ITEMS
76M
VIEWS
collection

eye 76M

This collection includes all collaborative Olympic crawls performed by IA for the IIPC.
Topics: olympics, IIPC, web
NLA 2017 Domain Crawl
collection
4,877
ITEMS
52.8M
VIEWS
collection

eye 52.8M

Crawls performed by the Internet Archive in 2017 on behalf of the National Library of Australia.
Topic: nla web 2017
UNPAYWALL-PDF-CRAWL-2018-07
UNPAYWALL-PDF-CRAWL-2018-07
collection
1,241
ITEMS
18.3M
VIEWS
by Internet Archive Web Group
collection

eye 18.3M

Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
nls_2010
collection
972
ITEMS
65.8M
VIEWS
collection

eye 65.8M

this data is currently not publicly accessible.
NLA 2022 Domain Crawl
NLA 2022 Domain Crawl
collection
3,501
ITEMS
12M
VIEWS
collection

eye 12M

Domain crawl of the Australian web domain (.au) performed by Internet Archive on behalf of the National Library of Australia in March-April, 2022.
Topic: crawldata
Olympics Crawl 2012
Olympics Crawl 2012
collection
703
ITEMS
58.3M
VIEWS
collection

eye 58.3M

These crawls were performed by IA on behalf of the IIPC in Summer 2012 during and prior to the 2012 Summer Olympics held in London, UK.
Topics: London, olympics, web, 2012, IIPC
nlaweb2016
nlaweb2016
collection
3,591
ITEMS
51.5M
VIEWS
collection

eye 51.5M

This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2016. 
Topics: nla, australia, web
BNL 2022 Autumn Domain Crawl
BNL 2022 Autumn Domain Crawl
collection
533
ITEMS
2.4M
VIEWS
collection

eye 2.4M

018-2022-autumn domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg.
Topic: crawldata
NLS_2011
NLS_2011
collection
1,518
ITEMS
55M
VIEWS
collection

eye 55M

These crawls of the .es domain were performed in 2011 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2011
Worldwide Government Web
Worldwide Government Web
collection
2,821
ITEMS
3.3M
VIEWS
collection

eye 3.3M

WARCS from Worldwide Government Web (WGW) Domain Crawls
Topic: crawl data
NLA_2015
NLA_2015
collection
3,088
ITEMS
48.6M
VIEWS
collection

eye 48.6M

This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2015.
Topics: nla, web, 2015
MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
15.1M
VIEWS
by Internet Archive Web Group
collection

eye 15.1M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
NLIL 2013 Domain Crawl
NLIL 2013 Domain Crawl
collection
1,187
ITEMS
28.2M
VIEWS
collection

eye 28.2M

This crawl of the .il domain was performed in 2013 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2013
National Library of Ireland Crawls
National Library of Ireland Crawls
collection
2,623
ITEMS
37.6M
VIEWS
collection

eye 37.6M

Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
13.4M
VIEWS
by Internet Archive Web Group
collection

eye 13.4M

NARA 112th Congressional Crawl
NARA 112th Congressional Crawl
collection
708
ITEMS
41.1M
VIEWS
collection

eye 41.1M

This crawl of online resources of the 112th US Congress was performed in Fall of 2012 and early winter of 2013 on behalf of NARA.
Topics: nara, 112th, web
NLA 2018 Domain Crawl
NLA 2018 Domain Crawl
collection
5,641
ITEMS
34.1M
VIEWS
collection

eye 34.1M

Crawls performed by the Internet Archive in 2018 on behalf of the National Library of Australia.
Topics: nla, web, 2018
NARA 114th Congressional Crawl
collection
3,619
ITEMS
37M
VIEWS
collection

eye 37M

This crawl of online resources of the 114th US Congress was performed on behalf of The United States National Archives & Records Administration (NARA).
IA-BR-2018
IA-BR-2018
collection
3,696
ITEMS
28.2M
VIEWS
collection

eye 28.2M

ccTLD crawl for .br domain
Topics: br, web, 2018, cctld
OA-JOURNAL-CRAWL-2020-07
OA-JOURNAL-CRAWL-2020-07
collection
1,923
ITEMS
12.5M
VIEWS
by Internet Archive Web Group
collection

eye 12.5M

NLA 2013 Domain crawl
collection
2,826
ITEMS
44.2M
VIEWS
collection

eye 44.2M

This crawl of the .au domain was performed on behalf of the National Library of Australia in Spring of 2013.
Topics: nla, web, 2013
NLA 2021 Domain Crawl
NLA 2021 Domain Crawl
collection
6,952
ITEMS
17M
VIEWS
collection

eye 17M

Domain crawl of the Australian web domain (.au) performed by Internet Archive on behalf of the National Library of Australia in March-April, 2021.
Topic: crawldata
bnf_2007
collection
321
ITEMS
41.9M
VIEWS
collection

eye 41.9M

this data is currently not publicly accessible.
NLA_2014
NLA_2014
collection
2,189
ITEMS
45.3M
VIEWS
collection

eye 45.3M

This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2014.
Topics: nla, web, 2014
nla_2008
collection
631
ITEMS
41.2M
VIEWS
collection

eye 41.2M

this data is currently not publicly accessible.
NLA 2019 Domain Crawl
NLA 2019 Domain Crawl
collection
5,711
ITEMS
23.5M
VIEWS
collection

eye 23.5M

Crawls performed by the Internet Archive in 2019 on behalf of the National Library of Australia.
Topics: nla, web, 2019
NLNZ Domain Crawl 2018
NLNZ Domain Crawl 2018
collection
3,541
ITEMS
28.3M
VIEWS
collection

eye 28.3M

Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2018.
Topics: web, nlnz, 2018
nla_2009
collection
568
ITEMS
35.2M
VIEWS
collection

eye 35.2M

this data is currently not publicly accessible.
collection

eye 39.7M

Topics: bne, spain, web, 2013
National Library of Sweden
National Library of Sweden
collection
310
ITEMS
36.1M
VIEWS
collection

eye 36.1M

Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
nl_sweden_2010
collection
309
ITEMS
36.1M
VIEWS
collection

eye 36.1M

this data is currently not publicly accessible.
UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
7.1M
VIEWS
by Internet Archive Web Group
collection

eye 7.1M

NLNZ Domain Crawl 2022
NLNZ Domain Crawl 2022
collection
3,190
ITEMS
7.8M
VIEWS
collection

eye 7.8M

Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-March, 2022.
Topic: crawldata
Fed Site Closure Crawls
Fed Site Closure Crawls
collection
1,858
ITEMS
16.6M
VIEWS
collection

eye 16.6M

These are crawls performed on US Federal Government Web sites prior to their removal or merge with other resources.
Topics: federal, web, closures
Fed Site Closures 2011
Fed Site Closures 2011
collection
1,855
ITEMS
16.6M
VIEWS
collection

eye 16.6M

This crawl was performed in Fall of 2011 to archive Federal government web sites that were either slated for removal or for merger with other online resources.
Topics: federal, web, 2011
nla_2007
collection
371
ITEMS
31.1M
VIEWS
collection

eye 31.1M

this data is currently not publicly accessible.
NLA 2020 Domain Crawl
NLA 2020 Domain Crawl
collection
6,153
ITEMS
16.2M
VIEWS
collection

eye 16.2M

Crawls performed by the Internet Archive in 2020 on behalf of the National Library of Australia.
Topics: nla, web, 2020
NLS_2012
NLS_2012
collection
776
ITEMS
35.4M
VIEWS
collection

eye 35.4M

This crawl of the .es domain was performed in 2012 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2012
National Libary of Ireland 2017 Web Archive
collection
2,510
ITEMS
22.6M
VIEWS
collection

eye 22.6M

2017 domain crawl for National Library of Ireland.
Topics: ireland, web
IMLS Museum Universe Data File Crawl
IMLS Museum Universe Data File Crawl
collection
2,885
ITEMS
41.7M
VIEWS
collection

eye 41.7M

2015 crawl of museum websites listed in the IMLS Museum Universe Data File. More about the IMLS MUDF can be found at https://www.imls.gov/research-evaluation/data-collection/museum-universe-data-file
Topic: AIT
nla_2006
collection
384
ITEMS
30.3M
VIEWS
collection

eye 30.3M

this data is currently not publicly accessible.
bnf_2005
collection
265
ITEMS
31.2M
VIEWS
collection

eye 31.2M

this data is currently not publicly accessible.
OAI-PMH-CRAWL-2020-06
OAI-PMH-CRAWL-2020-06
collection
2,946
ITEMS
7.2M
VIEWS
by Internet Archive Web Group
collection

eye 7.2M

NLIL 2021 Domain Crawl
NLIL 2021 Domain Crawl
collection
1,614
ITEMS
5.8M
VIEWS
collection

eye 5.8M

Domain crawl of the Israel web domain (.il) performed by Internet Archive in October-November 2021 on behalf of the National Library of Israel.
Topic: crawldata
bnf_2006
collection
323
ITEMS
27.3M
VIEWS
collection

eye 27.3M

this data is currently not publicly accessible.
NLIL 2014 Domain Crawl
NLIL 2014 Domain Crawl
collection
971
ITEMS
14.9M
VIEWS
collection

eye 14.9M

This crawl of the .il domain was performed in 2014 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2014
IMLS Museum Universe 00001
IMLS Museum Universe 00001
collection
2,273
ITEMS
33.1M
VIEWS
collection

eye 33.1M

Crawl 00001 of the IMLS Museum Universe Date File.
NLNZ Domain Crawl 2021
NLNZ Domain Crawl 2021
collection
3,481
ITEMS
13.5M
VIEWS
collection

eye 13.5M

Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-March, 2021.
Topic: crawldata
NARA 115th Congressional Crawl
NARA 115th Congressional Crawl
collection
2,886
ITEMS
15.7M
VIEWS
collection

eye 15.7M

This crawl of online resources of the 115th US Congress was performed on behalf of The United States National Archives & Records
Topic: crawldata
nla_2005
collection
175
ITEMS
25.5M
VIEWS
collection

eye 25.5M

this data is currently not publicly accessible.
NLIL 2015 Domain Crawl
NLIL 2015 Domain Crawl
collection
1,033
ITEMS
16.3M
VIEWS
collection

eye 16.3M

This crawl of the .il domain was performed in 2015 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2015
WEWA domain crawls
WEWA domain crawls
collection
6,902
ITEMS
10.1M
VIEWS
collection

eye 10.1M

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 17.0px; font: 12.8px Menlo; color: #161516; background-color: #ffffff} span.s1 {font-kerning: none} WARCS from Whole Earth Web Archive (WEWA) Domain Crawls
Topic: web
NLNZ Spring 2017 Domain Crawl
NLNZ Spring 2017 Domain Crawl
collection
2,390
ITEMS
18.8M
VIEWS
collection

eye 18.8M

This crawl of the .nz domain was performed on behalf of the National Library of New Zealand in Spring of 2017.
Topics: nlnz, web, 2017
MAG-PDF-CRAWL-2020-03
MAG-PDF-CRAWL-2020-03
collection
489
ITEMS
5.2M
VIEWS
by Internet Archive Web Group
collection

eye 5.2M

NLIL 2020 Domain Crawl
NLIL 2020 Domain Crawl
collection
1,355
ITEMS
8M
VIEWS
collection

eye 8M

Crawls performed by the Internet Archive in 2020 on behalf of the  National Library of Israel .
Topic: web
OA-DOI-CRAWL-2020-02
OA-DOI-CRAWL-2020-02
collection
278
ITEMS
4.2M
VIEWS
by Internet Archive Web Group
collection

eye 4.2M

nlnzweb2016
nlnzweb2016
collection
1,513
ITEMS
22.7M
VIEWS
collection

eye 22.7M

This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in January 2016.
Topics: new zealand, web, domain
Ukrainian Domain Harvest 2022
Ukrainian Domain Harvest 2022
collection
723
ITEMS
2.3M
VIEWS
collection

eye 2.3M

Web archive data collected by Internet Archive from a domain harvest of the Ukrainian ccTLD.
Topic: crawldata
NLNZ Domain Crawl 2019
NLNZ Domain Crawl 2019
collection
1,703
ITEMS
12.7M
VIEWS
collection

eye 12.7M

Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2019.
Topics: web, nlnz, 2019