HKDCWS

HathiTrust: Large-Scale Data Repository in the Humanities

This lecture provides an update on the recent developments and activities of the HathiTrust Research Center (HTRC). The HTRC is the research arm of the HathiTrust, an online repository dedicated to the provision of access to a comprehensive body of published works for scholarship and education. The HathiTrust is a partnership of over 100 major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. Membership is open to institutions worldwide.

Over 13.1 million volumes (4.6 billion pages) have been ingested into the HathiTrust digital archive from sources including Google Books, member university libraries, the Internet Archive, and numerous private collections. The HTRC is dedicated to facilitating scholarship using this enormous corpus through enabling access to the corpus, developing research tools, fostering research projects and communities, and providing additional resources such as enhanced metadata and indices that will assist scholars to more easily exploit the HathiTrust corpus.

This lecture will outline the mission, goals and structure of the HTRC. It will also provide an overview of recent work being conducted on a range of projects, partnerships and initiatives. Projects include Workset Creation for Scholarly Analysis project (WCSA, funded by the Andrew W. Mellon Foundation) and the HathiTrust + Bookworm project (HT+BW, funded by the National Endowment for the Humanities). HTRC’s involvement with the NOVELTM text mining project and the Single Interface for Music Score Searching and Analysis (SIMSSA) project, both funded by the SSHRC Partnership Grant programme, will be introduced. The HTRC’s new feature extraction and Data Capsule initiatives, part of its ongoing work its ongoing efforts to enable the non-consumptive analyses of copyrighted materials will also be discussed. The talk will conclude with a brief discussion of the ways in which scholars can work with and through the HTRC.

Bibliography

  • York, J. J. (2010, August). Building a future by preserving our past: the preservation infrastructure of HathiTrust digital library. In 76th IFLA general congress and assembly (pp. 10-15).

  • Fenlon, K., Fallaw, C., Cole, T., & Han, M. J. (2014, September). A preliminary evaluation of hathitrust metadata: Assessing the sufficiency of legacy records. In Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on (pp. 317-320). IEEE.

  • Jockers, M. L., Sag, M., & Schultz, J. (2013). Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Hathitrust. Hathitrust (June 4, 2013).

  • Plale, B., McDonald, R., Sun, Y., Kouper, I., Cobine, R., Downie, J. S., ... & Unsworth, J. (2013, July). HathiTrust research center: computational access for digital humanities and beyond. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries (pp. 395-396). ACM.