Skip to Content

Center forDigital Humanities


Digital Humanities Projects

Dirty History Crawler

The Dirty History Crawler (DHC) project was a Research and Development project co-created by faculty from the University Libraries, the Computer Science and Computer Engineering Department, and the School of Library and Information Science. The goal most broadly was to tackle an outstanding problem in historical humanities research, namely that of “dirty data.” DHC was to result in a software tool which would help researchers discover related to, but not directly within, a given search query.

The project produced several outcomes. First, research was undertaken in the original period 2013-2014 into the technical medium of Linked Open Data, widely used for storing and accessing resource metadata across different platforms. The phase supported the master’s thesis of Srikar Nadipally, supervised by Co-PI’s Manton Matthews and Colin Wilder. Nadipally also gave a widely attended presentation on campus. The following year, we were able build practical user software on the basis of this. The Dirty History Crawler website ran throughout 2015, achieving notable success. Unfortunately, its data requirements came to exceed what the data source – namely the database of the Online Computer Library Center (OCLC, parent of WorldCat.org) – could handle, and the site had to be taken off line for reconfiguration.

The DHC has now itself become the parent of a current undergraduate capstone project, undertaken by four senior Computer Science students in AY 2017-2018.


 

Note:

The project was originally entitled “Dirty History Metacrawler” but for technical and other reasons, the name was later shortened to “Dirty History Crawler” or DHC.