Mar 302012

This is a very lengthy blog to catch up on a couple of months of non-stop JISCMRDness and set out the position of RTDK at the end of month six. There will be some seriously dry reporting later but I am going to start with an emotive whiz through a very busy but rewarding period.

February was a head down month, in which we pressed on with actual hands on things, and I tried to balance technical work with project management, with the odd programme event thrown in for entertainment.

I joined Jonathan Tedds at the JANET Brokerage Event in Smithfield and came away disappointed. There were some really good presentations, including those from Middlesex University (totally outsourced to a managed private cloud) and Loughborough University (extensible hybrid cloud and a impassioned call to arms to JANET) but the whole day felt like it was preaching the message of virtualisation in the cloud as if we did not already know it was a good idea. There was a frisson of excitement as the vendors signed up to the brokerage were revealed, and their number increased from three up to five or six during the day, but that soon evaporated. The opening address told us that the details of the brokerage arrangements and offerings were specifically not on the agenda, and the subsequent JANET presentation was unenlightening. Via a question, we were told that the facilitation of RDM was at the forefront of their thinking, but there was nothing to evidence this, and Storage in the Cloud, for example, remains firmly on their to-do list. I suppose I had my clanky hat on in a room full of strategists, and probably missed the point somehow.

This distraction aside, I was busy doing testing, talking with colleagues from Physics and Maths about working together, arranging an upgrade to DSpace 1.8, and other things I have already forgotten.

By the time I looked up from the work bench it was the end of  February and the days were getting longer. The Dataflow/ViDaaS launch event in Oxford at the start of March was much more my cup of tea: demonstrations and discussion of two tools that could work well at UH. I have been aware of Dataflow for a while but this event brought it into focus. Datastage, in particular, was right on the money for me – delivering a platform for sharing files with anyone that the administrator chooses to enable; including staff, students and collaborators who may not normally be allowed to mix.  In addition, the data packaging process in Datastage will soon support Sword II, and at this point it will look like a great option for feeding our DSpace repository. The Dataflow demonstration almost worked too well in its simplicity, and I felt some people may have not appreciated what a practical solution this could be.  I think there were a lot of newcomers to the programme at the event and, as veteran of six months, I was struck by how these new voices took up now familiar themes and the Q & A soon conflated the specifics of the ViDaaS and Dataflow with the big issues of RDM. It was also good to re-meet a colleague from Oxford-Brookes, who I worked with on the JISC eLib programme about one sunspot cycle ago, and sobering to find we had both come back to similar contemporary jobs.

Next up, there was that big call to do extra stuff  to attend to and after that, and before I knew it, the JISCMRD Policy Workshop in Leeds loomed large. The great value in this event was the perspective gained from a large of group people, acting on the same imperative, via different paths. It was also important to me that the head of our Research Grants Team attended the workshop and returned a fully clued up, engaged and co-opted member of the RDTK effort, buzzing with ideas to better embed data management planning in pre-award workflows. More reflection on this valuable workshop can be found in an earlier blog.

So there we were, in mid-March, with no February report. There was no time! The RDTK team was expanding at last and I was very pleased to welcome Sara Hajnassari to the project. Sara has a wide experience of delivering projects in the telecoms industry and as a KTP project manager in UK HE. Sara is working on Workpackage WP1 (Audit and best practice) and Workpackage WP3 (Document management services) and will soon be out among our research community spreading the word on RDM.  She is already having an impact on progress on several fronts.

With the Ides safely negotiated, the Equinox brought the first meeting of our Stakeholder Forum, which we combined with a briefing from Andrew McHugh of the Digital Curation Centre so as to start a CARDIO assessment. This event was hard won, and it took an intervention from our Pro-Vice Chancellor (Research) to really get it off the ground. I may have been partially to blame for this by confusing the message when I hijacked the new group in order to jump start CARDIO, which isn’t an easy sell. Having done the first pass myself, I understand how busy, very focused, evidence driven individuals could find it hard to abstract themselves into a more qualitative mindset, so as to contribute to an aggregated institutional view. We did our best to persuade them it will be worth it. I thought the Stakeholder Forum went well in the end, as questions about CARDIO led to a healthy general discussion around the table and into lunch. The big message for RDTK was that in order to win more engagement we need to communicate benefit more loudly, and with clarity.

There ends my commentary on a busy couple of months. Hopefully the following more formal report shows we have achieved a lot.  I certainly feel like the momentum is building.

Summary of progress at end of Month 6 (of 18)

Workpackage WP1 (RDM Audit and best practice) has suffered delays as a result of recruitment difficulties. It is however functioning very well as an engagement mechanism and evidence gatherer for other workpackages. The challenge now is to focus on documenting best practice and use cases to populate the guidance section of the Research Data ToolKit.

Workpackage WP2 (Cloud Storage Pilot) is progressing well. Pilot services that complement local services and answer the demands of wide area collaborative data management have been demonstrated. The next iterations of what we have done so far are underway, and trials of Datastage and ViDaaS will be considered.

Workpackage WP3 (Document Management Pilot) is also now progressing well after a delayed start. A close working relationship with the Centre for Life Span and Chronic Illness Research is proving fruitful and the work is benefiting from the part time deployment of our document management specialist.

The project is enjoying an increasing audience across the university and work on an extended internal communications plan is underway.

Work package WP1 – Audit current RDM practice at University of Hertfordshire

A Cardio Assessment (Collaborative Assessment of Research Data Infrastructure and Objectives) actively assisted by the Digital Curation Centre (DCC) is underway. Andrew McHugh presented the background, rationale and process of Cardio, which is conducted via an online tool, to a meeting of the RDTK Stakeholder Forum in March 2012. Cardio will be progressed further in April. The representation at the Stakeholder Forum covered a broad church of interested parties including researchers, heads of research centres, service providers and research administrators. It should be possible to conduct an institutional level assessment via this cohort. This is important because although the activity is a key element in the RDTK, the work also needs to reach beyond the project to contribute to the University’s responses to the recent hardening of funding body policy.

Although the primary purpose of this workpackage is to ‘discover and evaluate existing good practice within successful UH research areas’ the work is also being used as a lever for engagement with the project and as a requirements gatherer for the other workpackages. In this sense it is progressing well.

We have found out as much (perhaps more) about the gaps and barriers to good RDM as we have about best practice. This is valuable experience and is consistent with that of our peers across the JISCMRD programme. An underlying theme in JISCMRD is to address those researchers for whom RDM is understood but difficult, or those for whom it is unrecognized and unconsciously undertaken. Our work is addressing those needs directly, often with immediate advice and interventions in response to the experiences being encountered.

We can report that:

  • there are enclaves of well organised and technically sound RDM. These are generally associated with STEM subjects;
  • there is widespread use of data storage which is expedient rather than most appropriate.  This ranges from myriad desktop storage solutions to commercial (but often free or low cost) cloud storage;
  • demand for the university’s centralised networked storage is patchy. For many researchers the barriers to uptake could be removed by better training and use cases. In only a very few cases is capacity likely to be a real problem. A major barrier is that the needs of collaborative research are not well served by the offer;
  • the need to robustly care for working data is understood and practiced widely across all disciplines, though more often than not the practice is ad-hoc, and in some cases unreliable or unsafe;
  • the attitude toward long term data preservation and re-use is bi-polar. Re-use is naturally embedded in researchers who are required to deposit in a national archive; but it is mostly an abstract and remote concept to those who are not required to deposit. Even among the former group, deposit can be viewed as a contractual responsibility, not a vehicle for benefit. Outside the Centre for Astrophysics Research, the idea of acquiring citations for data is largely unrecognised.

Deliverable D3a (Toolkit alpha), due at Milestone M2 and the end of month six is not complete but new advice and guidance have been published on the project website and as updates to existing materials, prior to being encapsulated in the toolkit. With new staff project staff and championing by the Stakeholder Forum the outlook for this workpackage remains good.

WP2 Cloud Storage Pilot

Workpackage WP1 has shown that the anatomy of a research group often includes UH research staff, UH research students, and external collaborators. Only the first or these partners normally have access to the shared storage on our virtual private network for staff (the UHStaffCloud). The extent to which the University’s network infrastructure can bend to meet the needs of research without impacting on all its other core user communities is limited. Furthermore, although accommodations can be made is some circumstances, having to make the effort turns away some researchers. Many research data management scenarios need more flexible arrangements.  We have found that our researchers use memory sticks, facilities in other institutions, or low cost commercial services such as Dropbox to share data, because UHStaffCloud is too difficult to access.

We have been working with the HRC Computing Consortium to establish pilot services at the Thor Datacentre in Iceland, which is operated by HRC’s partner, Advania. The relationship took time to become effective but the arrangement is proving beneficial to all parties. HRC benefit from direction as to the needs of their target market; Advania benefit from a partner to test and develop their SaaS offerings; and RDTK has the opportunity to acquire solutions that are similar to the unregulated commercial services that we know are in demand, in an environment that allows regulatory and policy frameworks to be explored.

The pilot cloud services consist of:

  • file storage;
  • managed backup;
  • Microsoft Sharepoint ‘foundation’;
  • Mysql database.

The backup service is a mature, ‘out-of-box’ solution from Advania. It has client applications for Windows, OSX and Linux. It lacks the polished interface of a global branded service but it is straightforward to install and use, runs in the background little with impact on the client system, and has very good notification reports. It compares very well with similar offerings from UK storage service providers such Datafort and JCOM.

We have tested three iterations of cloud file storage from locations within and without the UH local area network, compared them with the performance of UHStaffCloud, and with consumer level commercial cloud services. Using CIFS/smb or Webdav/https file sharing allows cloud storage to be integrated with the desktop on Windows, OSX or Linux. In this configuration, the only difference between ThorCloud and UHStaffCloud was in performance. As might be expected, when both were accessed from the local area network, the delays introduced by network latency resulted in ThorCloud being an order of magnitude slower than our local SAN based facilities. The situation is different when working at home or on some other remote public network. In this scenario, both ThorCloud and UHStaffCloud are remote from the third party location – both suffer latency to different extents; however available bandwidth becomes the limiting factor.  With a relatively fast domestic cable connection the differential between the services is reduced to a factor of only 3 or 4. When using a typical slow UK domestic broadband connection at around 4Mbits/s the performance of ThorCloud and UHStaffCloud is expected to be comparable.

The most similar commercial services in terms of functionality are Dropbox(.com free/ low cost) and Livedrive(.com very low cost). These cannot be compared in performance terms because they use local caching and throttled background file transfer. Webdav over http performed much the better of the two desktop solutions.

We also tested secure FTP, which requires a desktop client application for all but the most experienced OSX/Linux users. FTP is similar to services such as Microsoft SkyDrive or Amazon CloudFiles in that these services also operate through a client via a web browser. We found file transfer with FTP to be much faster than any of the commercial offerings and nearly comparable with UHStaffCloud, even inside the local area network.

Sharepoint shows promise for projects that require wide area collaborative document management or management of working data where version control, or multi tiered access control is necessary. It is a mature and increasingly common Software as a Service.

The primary interface is accessed via any modern web browser. Sharepoint folders can also be mapped/mounted so that they look like desktop folders, and integrate closely with Microsoft Office Applications. Advania’s Sharepoint infrastructure is still in development and cannot, as yet, be compared with that from best of breed vendors, such as Rackspace. The service we tested worked well in single user mode but we were unable to test collaborative document/data management, since we did not have a multi user account. This will follow soon.

We tested two iterations of Mysql database as a service.  Although Mysql is no longer opensource (it is owned by Oracle) it is widely and freely licensed, and ubiquitous, and can be effectively deployed with opensource management interfaces. In our case we were provided with phpMyAdmin.

Network latency was a major issue. Initially we hoped to use the service in standalone mode but when we tried to run the RDTK project web site (in Hatfield) using a Thor database (in Iceland) the response times were unacceptable. When the webserver was moved to the same machine as the database, the cloud hosted RDTK web site actually outperformed the original locally hosted site. Remote co-location of database and application will suit some, but not all, researcher requirements, and this configuration requires the use of a Virtual Private Server, which in turn increases unit costs and overall commitment.

These tests indicate that it is technically feasible for the UH/HRC/Advania partnership to deliver cloud services to complement local services and answer the demands of wide area collaborative data management. Advania are currently revisiting the services we tested to provide us with a more closely integrated portfolio and more appropriate management of accounts and authentication. This workpackage will continue to explore technical issues and lead on to further work on service levels and indicative costing.

More detailed reports on the work carried out so far are in preparation.

WP3 Document Management Pilot

The objective of this workpackage is to design, deploy and compare document management services; and relate this work to previous work from the JISC. Our Enterprise Application Consultant for Electronic Document and Records Management System (EDRMS) is leading the work, supported by Sara Hajnassari. We are collaborating with Karin Friedli and David Wellsted from the Centre for Life Span and Chronic Illness Research (CLiCIR).  The work currently involves an analysis of the physical record (known as a Trial Master File, or TMF) of a recent clinical trial. The plan is to develop an electronic file structure for the TMF and then use and improve it by applying it to a new clinical trial, which is currently going through ethical and contractual approval. This work will support CLiCIR in its aspiration to set up a local Clinical Trials Support Unit. From RDTK’s perspective we will be able to explore a demanding use case with a physical (non-digital) data management element. There will also be collateral benefit from Karin’s experience with the stringent requirements of health related data management.

There were two candidates for deployment of the electronic TMF: our existing Opentext Livelink EDRMS and the Microsoft Sharepoint service described in above. These systems have a lot of common functionality and at first sight some usability issues with the Opentext system allowed Sharepoint to be a serious contender. However, after an upgrade the Opentext EDRMS became much more accessible and integrated with the desktop, and this, combined with its known robustness and extra features such as retention policies and disposal workflows, make it the most appropriate choice.  When the electronic TMF has been developed we will look again to see how it might be implemented with Sharepoint (which is more likely to be available to our JISCMRD colleagues).

WP11 Programme Engagement

Since the project began four RDTK team members have attended 18 days at 9 programme events.  In February and March this included:

–        JANET cloud brokerage workshop, Smithfield

–        Dataflow/ViDaaS launch workshop, University of Oxford

–        JISCMRD policy workshop, University of Leeds

–        MRD for History workshop, Institute of Historical Research

–        JISCMRD Meeting Disciplinary Challenges, Paddington

In addition, Bill Worthington delivered the opening presentation in 2012 season of Information Hertfordshire Lunchtime Presentations for UH staff. The PDF of the presentation notes have been downloaded more than 130 times.