While many curatorial steps can take place long after a dataset's original point of creation or collection, data curation is generally more effective when conducted as an on-going process. As the Digital Curation Centre (DCC) writes,
It is a common misconception that data is created or captured and then passed on to someone else to curate. In fact, much of the most crucial information required for effective long-term curation and reuse must be captured at the conceptualisation and collection stages.
Lifecycle models are a way of atomizing data collection, use, analysis and preservation into component stages, and then identifying individual curatorial tasks that can or should be completed at each point.
There are numerous lifecycle models; for this institute we'll focus on three of the more widely referenced general models: the OAIS Reference Model, the DCC lifecycle model and the Data One lifecycle model.
(CCCDS, 2012)
Created by Consultive Committee for Space Data Systems (CCSDS) for use with space data, the OAIS Reference Model has been widely adopted by a broad range of non-space agencies as a general model of a preservation system for both analog and digital data objects.
The OAIS Reference Model provides information professionals with a conceptual framework of a preservation system, as well as a vocabulary of "terms that are not already overloaded with meaning so as to reduce conveying unintended meanings". In particular, it contributes the idea of the different information packages that are processed at different points during preservation work:
- the SIP, or submitted information package
- the AIP, or archival information package
- and the DIP, or dissemination information package
Preservation activities are prioritized according to the needs of a "designated community" of users.
The DCC model divides curatorial tasks into three interlinked categories, all of which are quite literally centered around data:
(Higgins, 2008)
Full lifecycle actions
These are actions that continue throughout the lifespan [link to lifecycle def in introduction] of a dataset, and require on-going attention from the curator. Read more
We'll talk more about these activities in coming sessions.
Occasional actions
These actions may need to be done repeatedly, but only every once in a while. Read more
Sequential actions
These actions mirror the general workflow of research, and each stage entails specific curatorial tasks. Read more
Note that in the DCC model, the majority of data work prior to it's deposit in an archive or repository takes place within the "Conceptualise" and "Create or Receive" stages. While this makes the DCC Model particularly relevant to academic librarians and data curators working at repositories or university libraries, it also makes it less helpful to those seeking to push curatorial activities "upstream" in the overall research process (see Wallis et al, 2008 for further explanation on the necessity of this). The Data One model presented below attempts to integrate curatorial and research activities a bit more holistically, as does the SBDC workflow that we'll discuss during Day 3's Case Study.
The Data One model is considerably simpler than the DCC model, and consists of 8 sequential, but continuous, stages:
Many of these stages are similar to the sequential steps described by the DCC model -- however we do note that the Data ONE model is intended to be viewed through the perspective of an independent researcher or research team undertaking curatorial tasks on their own or in occasional collaboration with a data center or archiving service. Consequently, many of their recommendations and tools are aimed at researchers rather than LIS practitioners. (Strasser et al, no date).
CCSDS (2012). Reference Model for an OAIS. PDF
Higgins, S. (2008). The DCC curation lifecycle model. International Journal of Digital Curation, 3(1), 134-140. PDF
Strasser, C., Cook, R., Michener, R., & Budden, A. (N.D.). Primer on Data Management: What you always wanted to know (but were afraid to ask). Data One. PDF
Wallis, J. C., Borgman, C. L., Mayernik, M. S., & Pepe, A. (2008). Moving archival practices upstream: An exploration of the lifecycle of ecological sensing data in collaborative field research. International Journal of Digital Curation, 3(1), 114-126. PDF
OAIS
ISO 14721:2003 link
CCSDS "Magenta Books" link
Knight, G., & Hedges, M. (2008). Modelling OAIS Compliance for Disaggregated Preservation Services. International Journal of Digital Curation, 2(1), 62–72. Retrieved from http://www.ijdc.net/index.php/ijdc/article/viewArticle/25
McDonough, J. P. (2011). Packaging videogames for long-term preservation: Integrating FRBR and the OAIS reference model. Journal of the American Society for Information Science and Technology, 62(1), 171–184. doi:10.1002/asi.21412
Vardigan, M., & Whiteman, C. (2007). ICPSR meets OAIS: applying the OAIS reference model to the social science archive context. Archival Science, 7(1), 73–87. doi:10.1007/s10502-006-9037-z
DCC
Data ONE
Other Resources and References
Ball, A., 2012. Review of Data Management Lifecycle Models. Other. Bath, UK: University of Bath. PDF
Pennock, M. (2007). Digital curation: A life-cycle approach to managing and preserving usable digital information. Library and Archives Journal, Issue 1. Retrieved (preprint) June 18, 2008 from http://www.ukoln.ac.uk/ukoln/staff/m.pennock/publications/docs/lib-arch_curation.pdf