Skip to Content

Center forDigital Humanities


Banner Image

Digital Humanities Projects

Dirty History Crawler

Dirty History Crawler is a Research and Development project co-created by faculty from the University Libraries, the Computer Science and Computer Engineering Department, and the School of Library and Information Science. The goal most broadly is to tackle an outstanding problem in historical humanities research, namely that of “dirty data.” Dirty History Crawler is a software tool which helps researchers discover data related to, but not directly within, a given search query.

The project produced several outcomes. First, research was undertaken in the original period 2013-2014 into the technical medium of Linked Open Data, widely used for storing and accessing resource metadata across different platforms. The phase supported the master’s thesis of Srikar Nadipally, supervised by Co-PI’s Manton Matthews and Colin Wilder. Nadipally also gave a widely attended presentation on campus. The following year, we were able build practical user software on the basis of this. The Dirty History Crawler website ran throughout 2015, achieving notable success. Unfortunately, its data requirements came to exceed what the data source – namely the database of the Online Computer Library Center (parent of WorldCat.org) – could handle, and the site had to be taken off line for reconfiguration.

Dirty History Crawler has now itself become the parent of a current undergraduate capstone project, undertaken by four senior Computer Science students in AY 2017-2018.


 

Note:

The project was originally entitled “Dirty History Metacrawler” but for technical and other reasons, the name was later shortened to “Dirty History Crawler” or DHC.