This is a short blog post about preserving digital data, from the perspective of an environmental archaeologist. There is a follow up post about open data in environmental archaeology. All of this comes from my personal experience of creating, sharing and trying to preserve digital data.
This post was originally published on 30 September 2015. It was updated on 5 October 2015 to include links to a follow up post.
For many years I worked on archaeobotanical material from Irish excavations. I identified and counted seeds, and presented my results in a table at the end of a technical report. The results were usually prepared so that they could be presented as appendices in excavation reports, and they were supposed to be printed and read as hard copy reports. Even when the reports were digital, they were usually a digital version that mimicked the paper report, e.g. a pdf, with the look and the format of the printed page preserved.
(N.B. This is not the best way to preserve data! That’s because it makes it difficult for others to re-use or manipulate the results. For details about how to make environmental archaeology data open, see Digital Data in Environmental Archaeology 2.)
Over the years I have moved house and changed jobs and, in the meantime, methods of storage of digital data changed (all my backups for my work in 2002 were on floppy disc). I lost the digital versions of a few reports, and some files became corrupted. This is why paper is still the preferred preservation medium for lots of different data types.
“Born-digital data are in most danger of being lost to future generations” (O’Carroll and Webb, 2012, 8).I started to worry about preserving my digital data. I began to adopt a preservation policy that involved the principle of LOCKSS (Lots Of Copies Keeps Stuff Safe).
One way to do make multiple copies of your data is to disseminate it online. But even when you upload a report or a dataset online you can’t ensure that the platform that you upload to will continue hosting your data forever. This is a problem across research institutions, and it has led to a call for the development of reliable repositories (with the resources to sustain data in the long term) and a system of Persistent Identifiers (PIDs) or handles.
The easiest way to assign a PID to your dataset is to upload it to a trusted repository. These will keep multiple copies of your data on their servers. There are a handful of trusted repositories for archaeological data, and a review of these is available on the website of the meta journal, Journal of Open Archaeology Data.
I have used both Figshare and Zenodo to upload my data (these are both trusted repositories that offer free services). The repositories assign a PID to the files, and this also means that it is easy for someone else to reference your work and acknowledge your contribution, as the repository generates a citation for the data (for example, one of my datasets that has been uploaded to Figshare is cited as: Johnston, Penny (2014). Plant remains data from Derrybane 2, Tipperary Ireland. Figshare http://dx.doi.org/10.6084/m9.figshare.1080723).
For more information on PIDs, see http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/persistent-identifiers
Using these services not only provides me with a step towards digital preservation, but it also means that it is much easier for me to share my data with other researchers. Making data open and accessible so that others can re-use it is the topic of my next blog post.