HKDCWS

The Data Curation Institute

Each afternoon of the institute will have two concurrent sessions: empirical case studies, and laboratory work.

Case Study

The case study approach to social science research includes the logic of de-sign, data collection techniques, and specific approaches to data analysis. It can be a very formal, or a wholly informal process. As Robert Yin explains, "it is not so much a method as it is a comprehensive research strategy (Yin,2003, p. 14). The case studies that we will explore are a mix of previous and ongoing work. For instance, we will outline a framework for developing curation policies through our on-going work with the Site-Based Data Curation (SBDC) project at the Yellowstone National Park (Thomer et al. 2014); and, we will also offer four comparative case studies of data curation in research libraries throughout the USA (Akers et. al 2014).

The marked features of a systematic approach to case study research are:

  • Investigates a contemporary phenomenon within its real-life context

This is to say that we will ground many of the mornings topics lectures in examples of data curation from organizations and institutions found in real world settings.

  • Is appropriate when the boundaries between phenomenon and context are not clearly evident

How and when library services are compatible with data services is an open question - there are many successful models which have been developed to meet researchers needs in data management and digital preservation. By looking at a range of possibilities you will be better prepared to select an appropriate model for your own institution. 

  • Copes with the technically distinctive situation in which there will be many more variables of interest than data points,

It will be impossible to fully review the decisions, finances, and technical details of each case study - by selecting key variables that related to our topical lectures we can use a case study framework to quickly understand the costs and benefits of choosing one mode over another.

  • Relies on multiple sources of evidence, with data needing to converge in a triangulating fashion.

The case studies that we will use are comparative - they are meant to highlight differences in approaches to data curation - emphasizing that there is not a right or wrong way to effectively serve end users.

  • Benefits from the prior development of theoretical propositions to guide data collection and analysis

Many of the case studies we will offer are from previous empirical studies in data curation. Therefore we can look across private, public, and mixed-cases for data curation, drawing on expertise from each different scenario.

Each Case Study section in this book will also provide background literature about the institutions we are describing, and a number of resources for further developing an expertise with these types of resources.

Laboratory Work

Each afternoon we will also conduct a laboratory session which is meant to give you hands on experience with tools, and techniques used in data curation - this includes data management planning toolkits, data normalization software (for cleaning and standardizing tabular data), and the use of a repository / digital preservation audit framework for evaluating curation infrastructures. These sessions are meant to give you an introduction to working with open-source, freely available tools. Each Laboratory section in this book will also provide background literature to using the tools, and a number of resources for further developing an expertise with these types of resources.

The tools that we will be using during the laboratory sessions include:

Data Curation Profiles

"A Data Curation Profile is essentially an outline of the “story” of a data set or collection, describing its origin and lifecycle within a research project. The Profile and its associated Toolkit grew out of an inquiry into the changing environment of scholarly communication, especially the possibility of researchers providing access to data much further upstream than previously imagined." A copy of the toolkit can be downloaded here

The DMP Tool

"The DMPTool is a free, open-source, online application that helps researchers create data management plans. These plans, or DMPs, are now required by many funding agencies as part of the proposal submission process. The DMPTool provides a click-through wizard for creating a DMP that complies with funder requirements. It also has direct links to funder websites, help text for answering questions, and resources for best practices surrounding data management." Get started with the DMP tool here

OpenRefine

"OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase." OpenRefine can be downloaded, and documentation can be found here