In between procrastinating over final reports and filling in odd gaps in our new RDM advice, we are preparing some material on the cost of a data loss event. This will arrive in due course, but I thought it would be useful to precede it with a consideration of the cost of simply owning a typical storage device operated by a typical small research group. I have experience of this in a previous life when I used to buy and maintain kit for several research groups in the Science and Technology Research Institute (STRI) at the University of Hertfordshire.
Dr Phil Richards, CIO at Loughborough coined the acronym DDUD - distributed datacenter under the desk [reference needed]. In STRI, our bit of the DDUD was probably quite advanced for its time: half a dozen network attached storage devices (NAS); an Uninterruptible Power Supply (UPS), mainly to protect against spikes in the office power supply; a partitioned section of an office, otherwise used as a machine graveyard; and a domestic air conditioning unit (AC).
Each NAS had four discs, configured in RAID5 array. So the mean-time-between-failure (mtbf) for a NAS was mtbf-device/4. Consequently, my experience was that every NAS suffered a disc outage at least once in its three year warranty period. We tended to retire them to non-essential use after the warranty due to the relatively high cost of replacement parts and increased rate of failure. When a disc failed, the RAID5 array protected our bytes and allowed us to continue working in a degraded state (though at considerable risk) while the replacement part was acquired, but the required downtime to replace the disc and rebuild the array (the discs were not hot-swappable) was quite an inconvenience. I mention all this not to prove my heritage as a geek, but to illustrate the point that maintaining local storage involves a lot of faffing about that needs to be accounted for.
So, to the calculation. I am going to use the direct modern descendant of our NASs for capital costs; use a pay scale of a middle career researcher for the labour cost (I was actually one, then two, whole pay scales higher at the time); and pick a Power Usage Effectiveness (PUE) ratio = 2. This means that the cost of running the room, UPS and AC is the same as the IT equipment, which is probably an underestimate, but it will do. I don’t know what our power costs are, so I will use a small business tariff that I happen to know about. I haven’t included a share of the capital cost of the UPS or AC.
4 x 1TB NAS, 3 year warranty, £2260 – SnapServer DX1 Enterprise
purchasing – 2 days (market evaluation, selection, vendor communication, requisition, payment);
delivery and setup – 2 days (goods inwards, commissioning, familiarisation, build RAID, testing, rollout to users);
regular maintenance/interventions – 1 hr@month ~= 36 hrs ~= 5 days;
1 disc failure intervention – 2 days (diagnose, contact vendor, arrange replacement part, swap disc and rebuild RAID, check data integrity)
1 day UH6 = 38,500 per annum / 220 working days per year = £175@day
Sum of effort = 11 days ~= £1925
Nominal 80 watt, with use ~ 0. 1 kW x 24hr x 350days x 3yrs ~= 2500 kWh, x 2 PUE ~= 5000 kWh
5000 x £0.15 kWh = £750
Total cost of ownership:
2260 + 1925 + 750 = £4935 for a RAID5 capacity of 3TB
= £1645 per terabyte year! Ouch.
You could argue that a RAID5 desktop attached device could be acquired for ~20%-25% of a NAS, bringing the cost down to nearer £1000/TB/yr, but I would suggest the attendent risk of failure is not worth considering.
Even subject to a 100% margin of error this means the cost of owning a bit of a DUDD is at least as much, but probably twice, that of premium rate cloud storage such as RackSpace Cloud Files or Amazon Simple Storage. And between 2 and 4 times the cost of storage in our own data centres.
Sticking with wild estimates, suppose we could consolidate 1PB of research data (~ 50% of our holdings) off the DUDD and into a efficiently managed hybrid cloud infrastructure @£800/TB/yr? 1024 x800 = £819,200.
1PB ~ £820k per annum, but you would save twice this much in distributed, unseen costs across the university. Net saving: close to £1 million per annum.
Someone check my figures please. We could all have new iMac’s, free coffee, a well resourced Research Data Management Service even.
If I have the sums right, there is a undeniably large amount of wasted money to add to all the reasons why we should be rationalising and centralising research data storage. The problem is, the waste is distributed and diluted – and the solution looks too big to countenance. We need to find a way to sell a research data storage service as a benefit, not a cure.