Sara Nassiri

Aug 202012

An audit of research data holdings within University of Hertfordshire was conducted in the period May to July 2012.

The online survey  (described in more detail here) was circulated to around 600 research staff first via their regular monthly newsletter, with follow up reminders sent by our information managers to schools and research centres, as well as via our continuing programme of RDTK awareness meetings and interviews. There were 67 responses which represents %12 of those invited to take part. Most research active disciplines were represented in the respondents, albeit with a strong showing from the STEM subjects.

The survey has brought insight into the extent of our research data. It allows us to estimate that we hold approximately 2PB across the whole research landscape. This is a factor of ten larger than our current central provision. However, around 80-90% of this belongs to a very few research groups, who are relatively well organised and funded for RDM, and it tends to be working data for those that crunch numbers – so it may not necessarily be data that requires retention. The remaining 10-20% of research data, which belongs to the balance of 80% of researchers, looks like a manageable quantity. This suggests that cultural change rather than capacity may be the predominant issue when it comes achieving a migration to a more robust infrastructure for working data for the majority of researchers. Likewise, we should expect to be able to manage the data that could be preserved, if we can build the culture and processes to make that possible.

In addition to requirements that have already been resolved (such as easy to use encryption and more flexible provision of storage for mixed staff/student/external research groups) the survey revealed some previously unvoiced requirements, such as centralised version control for source code, CAD and design files.

Perhaps influenced by the STEM respondents the survey also showed that venerable FTP is alive and still working well in amongst the new (and rebranded) offerings of the cloud. This indicates there continues to be profit in exploring a FTP based cloud storage pilot.

The key messages from the survey support the anecdotal evidence acquired to date – in the main there was no big, new, news. However, the subtext obtained is valuable and it underlines that considerable help and resources are needed over the whole project lifecycle, from planning to preservation, if we are to satisfy the demands of a rapidly developing (some would say hardening) funder’s policy regime.

Download the survey results (PDF  400KB).
Download further discussion and analysis (PDF 1.7MB)

Jun 272012

Electronic Document and Records Management systems (EDRMS) have the potential to answer the needs of some Research Data Management scenarios.

EDRMS offer file sharing, file versioning, flexible access control, and retention, preservation and discovery services (albeit most often in a closed environment). In the case where a project’s data is bound up in everyday office formats, and does not need a database or other structured format, an EDRMS can be used to bring rigour and robustness to otherwise freeform file management. Although there will often be a reluctance on the part of  a researcher to ‘get organised’, there are often circumstances where there is no choice: the nature of the work means that a high standard of file management is more than a matter of efficiency or professionalism and it becomes a requirement of the funding body, subject to audit. This is the case, for example, with all clinical trials.

In workpackage WP3 we are looking to improve project management practice in general and promote more robust RDM of unstructured data by developing a standard ‘file plan’ for use in research. This will be backed up by policy changes which will encourage, (then in the fullness of time mandate) the deposit of project documentation in our EDRMS. The file plan must be generic enough to be useful for researchers who just need to improve their document organisation, whilst also allowing for the more robust requirements of those needing to comply with external oversight. The starting point which informed this work was the JISC advice on Developing a File Plan in the JISC Business Classification Scheme (BCS) and Records Retention Schedule (RRS) for Higher Education Institutions infokit.

We are working with the Centre for Lifespan and Chronic Illness Research (CLiCIR) to develop an electronic Trial Master File (eTMF) for use in clinical trials. A Trial Master File contains essential documents: every document that is used to conduct and report on a clinical trial. Our eTMF will complement the locked filing cabinet which is currently used to satisfy the demands of audit. The Medicines and Healthcare products Regulatory Agency (MHRA), which is responsible for the regulation of medicines, requires each trial to be conducted in accordance with a Standard Operating Procedure (SOP).  The MHRA carries out unannounced inspections to audit SOP compliance including data management arrangements and storage procedures. We hope to demonstrate that an eTMF could be more efficient and at least as robust as the current paper based arrangements. In doing so, we are also developing our general purpose file plan.

We think the EDRMS will offer considerable benefit in respect of  managing the large volume of documents and data produced in drug trials, whilst addressing the legal requirements on data retention and security. In addition, we expect it to make the sharing of data between chief investigator, co-ordinating centre and participating sites more effective.

The workpackage mini-plan is as follows:

  1. Develop the File Plan
    • Define folder structure
    • Define classification, retention and disposal requirements
  2. Define Meta data requirements
  3. Build a Model Office
  4. Draw the life cycle of a clinical research project
  5. Develop roles, responsibilities and access
    • Define roles and access level (including: ownership, managed by, administrated by, visible to …)
    • Set security and access
  6. Create Data Management and Maintenance Policy
    • Develop guidelines for data maintenance and update
    • Develop guidelines on retention
  7. System Implementation
  8. Train users

After close consultation with our CLiCIR  colleagues we have produced a draft folder structure which is ready to deploy to support the eTMF and to take to other research groups for comment.  It includes many folders which are generally applicable and some which are peculiar to health related research. It also highlights those folders which must be highly secure in the context of the eTMF.

Each item in the EDRMS has associated metadata. In addition to the basic file management metadata required by the system, it is possible to add additional fields for project specific data. The work so far has focused on the metadata needed to describe a research project at UH.We will consider how this maps to other schema, such as those used by our CRIS, our  Research Archive and the various minimum metadata sets circulating in JISCMRD, in due course.

The EDRMS can be used as primary store of actual data but also as a management tool for externally located data, be it electronic or hard copy. It can manage the retention and disposal of both. The various requirements for retention and disposal of different types of research data have been brought together in a draft retention policy. (It is interesting to note that although the EDRMS is in theory designed to deal with the time scales involved, most of the retention schedules we see extend beyond the expected lifetime of the EDRMS itself.)


Jun 222012

RDM Audit

Work package WP1 – RDM Audit  is about assessment of RDM practices at UH in order to identify the gaps, requirements and to transfer knowledge from experienced RDM practitioners to all staff holding valuable data. We are  employing two methods to carry out the audit: DAF survey and interviews.

DAF survey: we have carried out a survey based on DAF methodology over the last month. The questions we asked were mostly faithful to the DAF online tool with some tweaks to accommodate local infrastructure. The result is more like that used by Orbital at Lincoln rather than Iridium at Newcastle. The survey was circulated to around 600 staff via our “Research Grants News and Funding Opportunities” newsletter, with follow up reminders sent by our information managers to schools and research centres.   We have had 60 responses so far from senior researchers, principal investigators, research students, lecturers and research fellows. We have extended the open period due to requests from a couple of research groups who want to consider it at their next regular group forums.  The results already make interesting reading and will be published here soon.

Interviews: We have designed an interview protocol for carrying out semi-structured interviews with selected researchers across different disciplines in the University of Hertfordshire.  The protocol was designed using the following sources:

Starting with the established ‘friends’ of the project we will deploy this interview protocol across our research community, and aim to use it as a source of case studies, leveraging each opportunity we get to assist a researcher with a particular problem.

RDM benefits

The final section of the interview protocol has been designed to help us with the vexed problem of evaluating the benefits of the Research Data ToolKit.

When evaluating benefit the first port of call will often be a hard financial metric:

  • does RDM as a whole cost less now than before we started?
  • are we winning more research grants as a result of RDM good practice?

Given the relatively short timescale of the project and our complete lack of existing RDM accounting we can not answer these questions.  These leaves us considering a softer set of metrics:

  • has the usage of robust centralised storage increased during the life of the project?
  • has the use of Data Management Plans increased?
  • how many datasets have we published in support of our publications?

Even these questions will not be easy to answer because they have not previously been asked, but they probably are measurable over the period of the project.

There are less quantifiable but still tangible benefits to be recorded too.  For example, RDTK has already lead to a closer a relationship between our information systems providers and research administrators, which had become distanced by organisational restructuring.  For another example, as a result of a tangential intervention from RDTK, our largest ‘departmental’ facility (an 80 core HPC cluster and 200TB SAN) is about to move out of its less than ideal premises and into one of our purpose built data centres, making it much less prone to downtime or disaster.

In order to capture this kind of collateral benefit and to try to get the individual researcher’s perspective we believe it is worth considering factors like ‘increasing awareness regarding RDM good practice’,  ‘improving staff confidence in developing a DMP’, as well as the ‘usage of resources and organisational capacities and infrastructure to support RDM activities’.   To this end we have added a section to our interview protocol which asks the respondents about their competencies in these areas. At the conclusion of the project we will return to our interviewees to see if their competencies have improved.

These measures of benefit may not show us an explicit return on investment, but to paraphrase ViDaaS’s James Wilson – it is better to measure what you can than what you can’t, and ‘soft’ benefits are known to yield hard results (see the latter part of JISCMRD launch event: Thematic session on the business case for RDM).