Jo Goodger

Jul 092013
 

At the 2013 National Astronomy Meeting in St Andrews, I presented a review of the UHRA the University of Hertfordshire Research Archive including the plans to preserve data in very long term cloud storage managed by Arkivum.  I also described the recent discovery of photographs depicting the development of the Bayfordbury observatory, and how these and the observational data will be released for reuse via the UHRA.  The audience was a mere 10 people, half interested in the history of astronomy, and the other, invited speakers from museums; the Science Museum in London and National Museums Scotland, and from Jodrell Bank, where they are petitioning for Heritage Site status.

The aim of the session was to discuss the strategy that the Royal Astronomical Society (RAS) should take with respect to preserving astronomy heritage for the future, but focused on historical artefacts from the 1900s and what should be kept and how for the next 100 years.  It was enlightening that the issues that surround objects such as telescopes, computers, instruments, and software, are similar to those that we are used to with digital data.

Storage for the vast number of objects is an issue with warehouses filling up and only 7-10% of objects being exhibited.  Storage is expensive, objects have to be catalogued, looked after, and access to objects is difficult due to the over cramming of storage spaces.  This could describe both digital data and historical objects; the issue of what should be saved, what can be saved, and where it should be kept is challenging.

The museums operate reactively, saving what they can when items are donated from private collections, families, and universities, or can be purchased from auctions.  However, without documentation, the item may not be identifiable, repairable, or recognised as  historically important.  A grey box could be anything or nothing without evidence; who owned it, what was it used to do, or as basic as what is it and when was it made?  This is metadata that puts the item in context without which the object could be destroyed; this is also the case for digital data.

While these issues of storage and metadata are important in astronomy heritage, retention is the major concern.  The audience recoiled at the prospect of retaining data for only 10 years as this is nothing compared to the 100-year timescale that they consider.  A hundred years. The idea that digital data would still be useful in 100 years is incredible.  It is understandable that photos, interviews, and videos, of people, places, and events would have value to future historians, but would astronomical observations also be useful?

We know that stars are born and die, that objects move, and this temporal information is crucial to science, but how could these data be preserved in a useful format?

Just 50 years ago, images from optical telescopes were recorded on glass plates.  These images show fine details of nebulae and galaxies, but cannot be used for modern scientific work as they contain no data on intensities, wavelengths, or spectra.  Newer electronic data from 30 years ago when the Very Large Array (VLA) in New Mexico, was commissioned, is still accessible and can be processed using radio image processing software called AIPS.  Recently, the array has been upgraded, the software is now being maintained by users, and in another 10 years these data may not be processible.  It begs the question, if there are new images, should the old data be kept? Also, should we continue to keep only the unprocessed ‘raw’ data?

To continue to make these data useful, we also need to keep the software, process instructions, and ensure that current operating systems can run the software – is this reasonable and cost effective for 100 years?  Perhaps if there are many data that would need the software etc., then it would be worthwhile, but maybe it would be more beneficial to keep the data in a processed form.  The question then is who processes it?  How should these data be done?  Some calibration methods are quite subjective.  With historical objects, the amount of information that should be kept is equally difficult; is it sufficient to keep the documentation and photos and discard the item itself?

The group described some objects as ‘Rich’, where there is obvious importance behind an object – this maybe something that is worth consideration.  The racket that Andy Murray won Wimbledon with this weekend would be a good example of a Rich object, but with digital data, the PI who requested the observation will think their data is far more important than everyone else’s.  In this respect we have a far more complicated choice to make for the future of digital data in Astronomy.

It was also interesting that many museums and university’s with lots of historical instruments keep catalogues of these objects.  These catalogues are not currently open access, and while you can ask if something is in the catalogue, you can’t search for an object or compare it with other sites.  It is a comfort that institutional repositories are making their data catalogues open access as comparisons are a vital part of research.  There was discussion about making a national list that would pull together the catalogues – the fact that so many museums and institutes have catalogues has shown there’s a demand for a national list and this is likely to happen in data repositories, once subjects show that there is a demand for these national data catalogues.

In conclusion, the results of the strategic plan produced by the RAS should provide some guidance for preservation selection criteria and retention periods.  I for one have learnt that those brassy, old-looking bits of equipment in our lab are worth keeping and I’m going to get a sticker put on them to contact the science museum if they’re at risk of being discarded.

Jul 092013
 

One of the results of the DAF at UH showed that researchers were open to training materials as long as they’re not long-winded or too generic.  However, the results of my interviews in Science and Technology, and interviews in the other research institutes by our champions, show that the best practice for looking after working data, including the storage and sharing of sensitive data, is universal.  This means that although training based on this best practice is largely generic, advertising it as such will not attract researchers.

I have tested an ‘Introduction to RDM’ course as an hour-long session aimed at new research students in the centre for astrophysical research (CAR).  As only first years are required to attend, there were only six students at this session last November.  As an introduction, the session included why RDM is important and a summary of the DMP topics, with a basic DMP handed out during the session for the students to complete.

The feedback was positive and all of these students appear to have benefited from a better understanding of back-up policies and the storage solutions available to them.

This was encouraging and we continued to plan a RDM session in our ‘Generic Training for Researchers’ (GTR) program.  Here in lies our biggest issue with training sessions.  As the RDM introduction session is both broad and generic, its relevance is not immediately clear to researchers.  They are also very busy and cannot justify spending an hour in a session that may not give them enough information to make it useful.

Making the session longer would allow us to give more details, but it is still generic training.  We have also had little interest from researcher students as it is not compulsory beyond the first year.  We’re now considering a different name for the session, perhaps “planning and managing your data”, or something that can be identified as relating to the DMPs that researchers will recognise.

So our strategy to train researchers is to run staff development courses on the tools, attach topics to existing training sessions, and run a poster campaign to advertise the website so researchers can get the answers and examples themselves quickly and easily.  This resolves the issue of a ‘time consuming training session’, but will get our best practice advice across in other sessions.

For research students, we plan to include RDM twice annually in the GTR program and in the department training programs. Even if only first year students are reached, we hope that it will spread by word-of-mouth to their peers and within 3 years, all of the researcher students will have had the training.  The change in student’s RDM behaviour will hopefully be noticed by their support team, who will then also benefit from their students’ training; a secondary method of getting best practice advice to our researchers.

Finally, we will be rolling out training to the service and technical staff so that they can all support the tools and the researchers when it comes to RDM.

So that we can re-use the materials for all audiences and so that future trainers can also target their RDM sessions, I have split our training into 18 topics and produced a table to help trainers choose which slides to combine for their session.   The slides for finishing projects are not ready yet as the guidance for preservation is still inconclusive, but the table below shows the scope of what the training will include.  The training will also include packages of examples for the research groups which will make the training relevant when delivered in the departmental programs.  These topic  presentations will be recorded using Camtasia this autumn so that they can be watched by researchers online if they want a refresher; this may be preferred training to reading a  how-to guide.

This table should help you select which slide packages to use for training different audiences

This table should help you select which slide packages to use for training different audiences

Jul 092013
 

We planned to produce discipline specific examples DMPs for our researchers.  However, as we prepared our best practice advice we learnt how many of the DMP answers for the working data stages are similar throughout the university in respective of the subject, and that the main differences are in the funding-body requirements for archiving and preservation.

We therefore endeavoured to develop a DMP template for the University of Hertfordshire that would stand alone, cover the full life cycle of research, and not require a great deal of extra information on top of what is already answered by the researcher in other funding-body templates.  We are conscious that researchers are not inclined to repeat themselves and that by limiting the answers that are unique to the UH template, they are more likely to complete an additional template.

We therefore began by comparing the DMP questions within the existing RCUK templates on DMPonline to the checklist in our data policy – this checklist has subsequently been removed from the UH data policy in a favour of a requirement to complete the DMPonline UH template.  We found that 95% of the UH checklist was covered by one RCUK template or another.

We therefore decided to include the 50 questions that were also in the RCUK templates, adding only contextual information at the beginning and four questions unique to UH which focus on file naming conventions and resources for computing.

We have sent the draft UH template to our champions to gather multidisciplinary advice before we upload our template this summer by which point our website will also be finalised and published.  Our main concerns with this template is that it should be sent post-award to our document management system (DMS), where it will be store in perpetuity.

We have already received a number of requests for help with DMPs and our champions have been contacted with collaborations based on their basic knowledge of DMPs, all of which suggests that training sessions based on DMPs will be popular and that we should hurry to get our template in place on DMPonline before too many of our researchers complete other templates.

Jul 092013
 

We are not the first institute to produce a website of advice for our researchers, and we wont be the last.  We already have in place the UH public website and two intranet resources; studynet, where students and staff communicate about courses and where information about research is available, and staffnet, which gives information on policies and research services such as the intellectual property and contracts office (IPACS) and the research grants office.  It struck us that while much of the information related to good RDM is available on these sites, only one site is openly available and the information relating to remote access for example is only available on a internal site.

We therefore decided that our advice and guidance would be best placed on the UH public website.  This does limit the look of the RDM site as we have no control over colour schemes or formats, and we have limited choices for the layout of case studies and the advice.  Hopefully, future iterations of the UH website will include more flexibility for its micro sites and we will be able to include dynamic content.

We chose to include as much information and advice as possible so that if people are not available for one-to-one assistance beyond this project, sufficient advice would be on the site.  We currently have 50 pages covering 18 RDM topics as well as additional pages on governance, training, and examples.  There are 6 main sections, covering the RDM life cycle; planning, starting, working, and finishing, as well as training and legal issues.  These section were chosen to cover an equal number of topics and as sensible splits in the life cycle.  The training materials are also divided into the four RDM life cycle sections.

The site is written in a relaxed tone with language which is not overcomplicated so that it is useful to researchers, research students, and support staff. We are now concentrating on open images to illustrate the site and supporting guides for the tools, whilst getting feedback on the content of the site from our stakeholders and all of the contacts that we have made during our project.  This includes collecting more case studies and getting authorisation to publish those that we have already written up.  We are now hoping to publish the site by the end of July at the same time as publishing our UH DMP Template.

Update May 2014 : the re-branded RDM pages are now available at http://bit.ly/uh-rdm