Nov 302011

Work package WP1 – Audit current RDM practice at University of Hertfordshire

Cardio and DAF : We have agreed a plan with Andrew McHugh of the Digital Curation Centre (DCC) to carry out a Cardio Assessment with key UH stakeholders, followed by a Data Asset Framework (DAF) exercise with one of our large research groups in Health and Human Sciences.  Working with the Andrew will hopefully allow us to equip our own staff to deploy DCC tools and methods in the future, whilst conducting an effective audit in the present. The DCC have match funding, so project money spent with them is doubled up and looks like good value.  The micro plan for the Cardio deployment is this:

  • engage stakeholders (November, #rdtk_herts project team)
  • introduction workshop, in which DCC introduce the audit methodology (December, All stakeholders, DCC led, 1hour)
  • individual RDM assessments using the DCC’s online Cardio assessment tool (December/January, All stakeholders, at their convenience)
  • aggregate and analyse results (January, DCC)
  • feedback workshop (January, All stakeholders, DCC led, 1-2 hour)

I found the mini-Cardio test: ‘What’s your institution’s research data management diagnosis?’ available in the latest issue of JISC Inform was really helpful in working out what to expect  (thanks to Simon Hodson via for the tip.)

We want as broad a church as possible to take part so we have invited senior management, senior systems and information management staff, registry staff and eight research leaders in the hope of drawing together a dozen or more strategically placed participants for this first round of the exercise.

Small steps : While we have been out and about talking with researchers and considering grand plans, we have been reminded that this project is also about small steps for improving research data management. We have already found two instances of Unsafe USB Use (should #JISCMRD have a competition to find a witty but non-judgemental label for this affliction?). People are either using memory sticks as the primary store for their work and then keeping a backup in an equally unsafe environment like their desktop or home machine, or visa versa. Simply showing them how to access the 5GB of secured, networked, personal storage (which is available to all staff at UH) reduced their potential for disaster. Picking up on a theme which was voiced at #RDMF7, we clearly need to update our induction briefings for staff and postgraduate researchers, so that they know how to use their managed storage.

WP2 & WP3 – Pilot services

Cathy Tong and I have been talking with colleagues from the Centre for Lifespan and Chronic Illness Research and the Centre for Research in Primary and Community Care. Three projects in particular stand out. Two of these projects involve clinical trials and it has been interesting to see the regulatory rigour with which clinical work is carried out. On first thought it seems natural that clinical work is subject to strong regulation, but on reflection, why is it that any research endeavour should not be practiced under an equivalent regime?  When comparing a Standard Operating Procedure (SOP) document for a CTIMP project with existing RDM guidance or our own data policy, there is little difference in the good practice therein – the difference is in the culture of use, where clinicians have to abide by the rules.

Returning to practical matters, one of the projects we are going to work with is is complete and our task is to facilitate controlled access to the study data (about 300MB held in Excel, Stata, and Epidata) and the extensive paper and electronic documentation for 5 to 10 years. The university’s Document Management System (OpenText) looks useful in this respect since it is already being used to keep track of the location, lending and disposal of a lot of legacy hard copy.  We know little about  Stata and Epidata formatted data and intend to seek advice from the Biomedical Research Infrastructure Software Service (BrissKit) and DSpace Cambridge with regard to its curation.

The other clinical trial will be getting under way in Spring 2012 and offers a perfect opportunity to deploy sound RDM solutions from the start. It has both data and document management facets. The primary data gathered from several hospitals will be collected and held elsewhere but the anonymised working data will be a local responsibility. There is a paper trail for each observation and although this must be retained we intend to maintain a parallel, electronic ‘Trial Master File‘ using OpenText or Sharepoint.

The third health related project is a longitudinal study involving a large SQL dataset. We are looking at various options for cloud based database hosting from our own ‘local cloud’ to the HRC Computing Consortium (HRC3) and Eduserve.

In addition to conversing with practitioners I have also been looking at capacity building – or the who, how and what  of services our project could offer. To this end I have exchanged first thoughts with developers from HRC3, which resulted in many questions, too specific for this blog to relate, to answer.

I also braved the fog of the Thames Valley to attend a demonstration of the Virtual Infrastructure with Database as a Service  by VMware/VIDaaS at Oxford University Computing Services.  A large part of this seemed to be a VMware pitch. They showed, for example, virtual private servers being replicated and moved around the world at will to wherever ‘compute resource’ might be found at the right price. This was all very well for a VPS holding a web application/ui layer but I don’t see multi-terabyte datasets whizzing around the globe in this manner. I nearly missed the important point that if it the physical location or legal jurisdiction of your data matters, then VMware iCloud easily accommodates this requirement too.  Subsequent discussion with project manager James Wilson and Jonathan Tedds of BrissKit revealed more about ViDaaS and the progress of  the UMF work in the cloud. ViDaaS will offer database hosting to a number of partners in order to help develop further, but it is intended as an open-source vehicle  for others, such as Eduserve, to offer. The ViDaaS virtual machine can be hosted on a wide range of infrastructure and is not limited to delivering Postgres. We talked also about an interesting element of the work which is to break the reliance on local (and thus fragile) databases by offering a migration tool for MS Access. I came away with the impression that JISC sponsored DaaS is ‘not quite’ available at this moment. It is dependent on the fruition of the UMF cloud brokerage, due in Spring 2012, to become a reality.

WP11 – JISC Managing Research Data (MRD) programme activity

In addition to the ViDaaS trip, I also attended all three days of DCC Roadshow Cambridge earlier in November, and latterly resurrected my Photoshop skills to make our Poster for the MRD programme launch meeting. Following on from RDMF7, the roadshow completed my crash course and introduction to the RDM community. As a result we (the University of Hertfordshire) can see where our project fits in the scheme of things and can be confident of making progress and contributing to the whole. More will no doubt be revealed in Nottingham over the coming days.  There is a reflection on some of the many issues arising from the roadshow in a previous post.

WP12 – Project Management

Recruitment continues for a Process Business/Analyst and a Systems Consultant/Developer to work on the toolkit as it gathers pace in the new year.

Our first Steering Group meeting has been convened for 9th December.


  One Response to “Progress Report Month 2, November 2011”

  1. Thanks for this, Bill. The account of the ‘micro-plan’ for using CARDIO and – subsequently – DAF, and the discussion of the potential pilot services is very interesting.

    Best, Simon.