HKDCWS

Data Curation Lifecycles and Tools

From the OAIS to the Data Lifecycle

While many curatorial steps can take place long after a dataset's original point of creation or collection, data curation is generally more effective when conducted as an on-going process. As the Digital Curation Centre (DCC) writes,

It is a common misconception that data is created or captured and then passed on to someone else to curate. In fact, much of the most crucial information required for effective long-term curation and reuse must be captured at the conceptualisation and collection stages.

(from the DCC FAQ)

Lifecycle models are a way of atomizing data collection, use, analysis and preservation into component stages, and then identifying individual curatorial tasks that can or should be completed at each point.

There are numerous lifecycle models; for this institute we'll focus on three of the more widely referenced general models: the OAIS Reference Model, the DCC lifecycle model and the Data One lifecycle model.

OAIS Reference Model

image

(CCCDS, 2012)

Created by Consultive Committee for Space Data Systems (CCSDS) for use with space data, the OAIS Reference Model has been widely adopted by a broad range of non-space agencies as a general model of a preservation system for both analog and digital data objects.

The OAIS Reference Model provides information professionals with a conceptual framework of a preservation system, as well as a vocabulary of "terms that are not already overloaded with meaning so as to reduce conveying unintended meanings". In particular, it contributes the idea of the different information packages that are processed at different points during preservation work:

- the SIP, or submitted information package
- the AIP, or archival information package
- and the DIP, or dissemination information package

Preservation activities are prioritized according to the needs of a "designated community" of users.

The DCC Lifecycle Model

The DCC model divides curatorial tasks into three interlinked categories, all of which are quite literally centered around data:

image

(Higgins, 2008)

Full lifecycle actions

These are actions that continue throughout the lifespan [link to lifecycle def in introduction] of a dataset, and require on-going attention from the curator. Read more

We'll talk more about these activities in coming sessions.

Occasional actions

These actions may need to be done repeatedly, but only every once in a while. Read more

Sequential actions

These actions mirror the general workflow of research, and each stage entails specific curatorial tasks. Read more

Note that in the DCC model, the majority of data work prior to it's deposit in an archive or repository takes place within the "Conceptualise" and "Create or Receive" stages. While this makes the DCC Model particularly relevant to academic librarians and data curators working at repositories or university libraries, it also makes it less helpful to those seeking to push curatorial activities "upstream" in the overall research process (see Wallis et al, 2008 for further explanation on the necessity of this). The Data One model presented below attempts to integrate curatorial and research activities a bit more holistically, as does the SBDC workflow that we'll discuss during Day 3's Case Study.

The Data One Model

The Data One model is considerably simpler than the DCC model, and consists of 8 sequential, but continuous, stages:

image

Many of these stages are similar to the sequential steps described by the DCC model -- however we do note that the Data ONE model is intended to be viewed through the perspective of an independent researcher or research team undertaking curatorial tasks on their own or in occasional collaboration with a data center or archiving service. Consequently, many of their recommendations and tools are aimed at researchers rather than LIS practitioners. (Strasser et al, no date).

Works Cited

  • CCSDS (2012). Reference Model for an OAIS. PDF

  • Higgins, S. (2008). The DCC curation lifecycle model. International Journal of Digital Curation, 3(1), 134-140. PDF

  • Strasser, C., Cook, R., Michener, R., & Budden, A. (N.D.). Primer on Data Management: What you always wanted to know (but were afraid to ask). Data One. PDF

  • Wallis, J. C., Borgman, C. L., Mayernik, M. S., & Pepe, A. (2008). Moving archival practices upstream: An exploration of the lifecycle of ecological sensing data in collaborative field research. International Journal of Digital Curation, 3(1), 114-126. PDF

Bibliography

OAIS

  • ISO 14721:2003 link

  • CCSDS "Magenta Books" link

  • Knight, G., & Hedges, M. (2008). Modelling OAIS Compliance for Disaggregated Preservation Services. International Journal of Digital Curation, 2(1), 62–72. Retrieved from http://www.ijdc.net/index.php/ijdc/article/viewArticle/25

  • McDonough, J. P. (2011). Packaging videogames for long-term preservation: Integrating FRBR and the OAIS reference model. Journal of the American Society for Information Science and Technology, 62(1), 171–184. doi:10.1002/asi.21412

  • Vardigan, M., & Whiteman, C. (2007). ICPSR meets OAIS: applying the OAIS reference model to the social science archive context. Archival Science, 7(1), 73–87. doi:10.1007/s10502-006-9037-z

DCC

  • DCC Lifecycle Model and Web Resources link

Data ONE

  • DataONE general resources and best practices link

Other Resources and References