Oct 152012

Call for interest in a DOIs for Datasets workshop.

Overshadowed by the subsequent trading of blows over the colour of Open Access, RCUK’s policy toward open data became more explicit in their announcement on July 16.

“and a statement on how the underlying research materials such as data, samples or models can be accessed”

Not if. How. At University of Hertfordshire we had already decided, in the context of our EPSRC roadmap, to extend our institutional repository to support datasets. A major aspect of making this work is the provision of Digital Object Identifiers for our data.

As a newcomer to the JISCMRD programme a year ago in 2011, I hope I would have been forgiven for thinking that the DOI piece of the MRD jigsaw was firmly in place; a given. I had grounds for this casual assumption. Witness: DOI was well established and seemingly uncontentious in the JISCMRD lexicon; UHRA, our own institutional repository was littered with DOIs; they have been around since before the millenium; and well – it is easy – Digital Object Identifier, a widely used citation mechanism, a persistent, unique ID for a digital thing. This complacency was compounded by later experience: exposure to Data Drayd‘s use of DOIs for datasets (tick) and then Louise Corti’s excellent presentations about the data citation and versioning. Job done.

Right up until the point at which you begin to need one, DOIs look straightforward. However, as we approach the moment at which we will be asking our researchers to begin publishing datasets in our repository, the hard questioning begins. At the first of the British Library DataCite workshop series, (reported by Kaptur and data.bris), I began to see less clearly. Or at least, feel like hyperopia had set in. The goal was still in sight but the details in the foreground were not clear.

The questions began to pile up. How do we get DOIs for our datasets? Is there an api to a Datacite/BL service? Could/should University of Hertfordshire mint DOI’s? Would local minting consortia be more appropriate? What about the B-word – where is the benefit over an equivalent handle system already built into our repository and shared by umpteen thousand other DSpace installations? Panic in the detail.

Before this blog turns into bleat it is time to calm down and visualise the problem, this always helps:

Well, it helped me anyway.

In all seriousness, I think that most JISCMRD projects will have to answer the questions and flesh out most of the lines on this mind map eventually, particularly in the detail over on the right hand side. In all probability, these issues are tractable, and it is just a matter of enough effort. But it seems sensible to share the problem if many of us are to be occupied by it. We have had some early discussions with the British Library and with their encouragement I would like to propose a DOIs for Datasets workshop, over and above the continuing BL/DataCite series, specially focused on how to acquire or mint DOIs for our datasets. The University of Hertfordshire would be pleased to arrange such an event if there is interest from enough programme members. The agenda would be dictated by demand, but we foresee some sessions already: role of consortia vs national minting services; service level agreement /obligations of a minting body; overview of existing services, apis, scripts and other magic. The workshop would be in held in London or Hatfield in early winter 2012/2013.

To register an interest in DOIs for Datasets please use the comment form below. If you feel moved to discuss the proposed workshop or any of the issues arising on twitter please use the #dois4datasets tag.

  7 Responses to “DOIs for datasets.”

  1. Yes, myself (or a colleague from York) would be interested in this. I think we are all grappling with the same problems!

  2. I would be interested in coming along to discuss how figshare.com is helping Universities and publishers make all research outputs citable and discoverable, including datasets.

  3. Hi Bill,
    I’m unconvinced about the need for institutional data repositories to adopt DOIs. So far the DataCite consortium is made up of bigger players, such as national data archives (and Dryad is tied to publishers). Like yourself, our repository uses handles, native in DSpace. This offers persistent identifiers, which appears to be sufficient for meeting the criteria of the proposed Data Citation Index from ISI (at least they don’t mention DOIs in their document).

    Personally I’m not looking forward to the added overhead (like you, not exactly sure what it is) of turning our handles into DOIs. None of our users have asked for it. I’m sitting on the fence waiting to see if this is a bandwagon one must join or if handles will continue to do the trick.

    Since you’re a DSpace user, you might be interested in the “suggested citation” that we offer on the splash page – made up of metadata entered for the item. I think this is more useful to depositors as a way of encouraging citation (along with the handle).

    Robin Rice
    EDINA and Data Library
    University of Edinburgh

    • Robin,

      that is an interesting take. I wonder just how much it is influenced by the very fact DSpace operators do already have the handle system?

      I suppose I took the very simplistic view that the majority of traditional published research outputs have a DOI and to achieve an equitable citation position datasets will require the same.

      Please add the URL of your ‘suggested citation’ here do add to the discussion.

  4. Hi Bill, I would be interested in attending this event.

  5. Hi Bill,
    Just saw your questions back to me. I left the link to the repository in the URL on my name. If you look at any dataset, e.g. http://datashare.is.ed.ac.uk/handle/10283/220
    you’ll see the citation, taken from the metadata entered upon deposit, and the handle.

    It’s interesting that you equate deposited datasets with published articles. I equate them more with the items in our publication repositories that are open access. These don’t have DOIs, typically because they are open access copies of published articles, and the Crossref rules for DOIs state that the same item should not have two different DOIs.

    What is going to happen if your depositor wants to deposit data in more than one repository? What are you going to do if they correct mistakes in the data and redeposit? I think in practice the answer is DataCite is not going to enforce this rule, and different DOIs for the same intellectual “item” will proliferate but it is looking less and less like traditional DOIs as used by publishers.

    Does your repository consider itself the publisher of your users’ datasets or just the host? Do you do any modifications or quality assurance or peer review? Do you own the rights instead of the depositor? Are you willing to accept liability for what is in the repository or would you expect to pass that on to your depositor? That is a legal difference between publisher and host. (Given, it may be the institution in both cases, but anyway.)

    I’m still wondering what the added value of DOIs over handles is (and yes, the ease of incorporating handles was one reason we chose DSpace – I am in favour of persistent identifiers, just not a monopoly of one particular kind). I’ve heard that BL is considering charging each institution 1500 pounds per annum for DOIs – at the rate of deposit we have those are very expensive DOIs indeed.

  6. Hi, I’d like to attend DOIs for Databases.
    (Soon-to-be-researcher at UH)