Is there Paradata? A CRMdig-Supported Mapping of Provenance and Process Information in Archaeological Datasets

Submitted by Isto Huvila on Wed, 10/07/2020 - 13:52

Presentation together with Olle Sköld and Lisa Börjesson at the CAA Nordic 2020 conference organised online by CAA Norway and the University of Oslo.


It has in parallel become increasingly clear that repositories carrying archaeological data are not as used or useful as they could be (Kim and Yoon, 2017; Voss, 2012), and that these shortcomings cannot be addressed by infrastructural expansion and refinement (more content, features, interoperabilities) alone (Baker and Yarmey, 2009; Birnholtz and Bietz, 2003). Previous research shows that a ubiquitous barrier to efficient and purposeful (re)use of archaeological repository data stems from from the episodic structure of the scholarly-data lifecycle; data is originally created in a certain setting of disciplinary, epistemic, methodological auspices and re-used in a setting that is almost inevitably different to a degree (Van House, 2004). The intellectual horizons of primary and secondary data usage can be to some extent merged if repository users are supplied with auxiliary provenance information about the processes and tools involved in creating the data (e.g., Faniel and Yakel, 2017; Yakel et al., 2013)—so-called 'paradata' (Couper, 2000). Much however remains to be investigated about how to efficiently and purposefully capture and disseminate paradata with the objective to support the reusability of archaeological data. The present paper contributes to this vein of work by addressing a question that has hitherto seen little research attention: what paradata is presently included in archaeological data repositories, and what are the principal characteristics of such paradata in terms of content and descriptive structure? The paper is empirically based on a pilot study of metadata pertaining to data papers published in the Journal of Open Archaeology Data (JOAD). The analysis of the metadata is conducted using a qualitative coding schema (Saldaña, 2015) based on CRMdig, a provenance-metadata ontology in the CIDOC CRM-family ( CRMdig sensitizes the analysis to the wide range of categories and relationships described by the ontology and allows for analyses of paradata expressions in the JOAD data that remain outside of  the ontology’s scope. The results of the paper include a rendering of the paradata that was encountered in the pilot study, and reflections on how already-present paradata in archaeological data repositories feasibly can be used to create resources that support (re)use of archaeological data.


  • Baker, K. S. and Yarmey, L. (2009). Data Stewardship: Environmental Data Curation and a Web-of- Repositories. International Journal of Digital Curation, 4(2).
  •  Birnholtz, J. P. and Bietz, M. J. (2003). Data at Work: Supporting Sharing in Science and Engineering. In: Proceedings of the 2003 International ACM SIG- GROUP Conference on Supporting Group Work, pp. 339–348. ACM: New York.
  • Carlson, S. and Anderson, B. (2007). What Are Data? The Many Kinds of Data and Their Implications for Data Re-Use. Journal of Computer-Mediated Communication, 12(2):635–651.
  • Couper, M. P. (2000). Usability Evaluation of Computer-Assisted Survey Instruments. Social Science Computer Review, 18(4):384–396.
  • Faniel, I. and Yakel, E. (2017). Practices Do Not Make Perfect: Disciplinary Data Sharing and Reuse Practices and Their Implications for Repository Data Curation. In: Curating Research Data, Volume One: Practical Strategies for Your Data Repository, pp. 103–126. ACRL: Chicago.
  • Faniel, I. M. and Jacobsen, T. E. (2010). Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data. Computer Supported Cooperative Work (CSCW), 19(3):355– 375.
  • Faniel, I. M., Kriesberg, A., and Yakel, E. (2016). Social Scientists’ Satisfaction with Data Reuse. Journal of the Association for Information Science and Technology, 67(6):1404–1416.
  • Kim, Y. and Yoon, A. (2017). Scientists’ Data Reuse Behaviors: A Multilevel Analysis. Journal of the Association for Information Science and Technology, 68(12):2709–2719.
  • Saldaña, J. (2015). The Coding Manual for Qualitative Researchers. Sage.
  • Van House, N. A. (2004). Science and Technology Studies and Information Studies. Annual Review of Information Science and Technology, 38(1):1–86.
  • Voss, B. L. (2012). Curation as Research: A Case Study in Orphaned and Under-reported Archaeological Collections. Archaeological Dialogues, 19(2):145–169.
  • Yakel, E., Faniel, I., Kriesberg, A., and Yoon, A. (2013). Trust in Digital Repositories. The International Journal of Digital Curation, 8(1):1–14.