Oct 172012

The hybrid cloud approach is being adopted in many organisations, particularly where high quality local infrastructure is in place and is likely to remain useful for a period of years but needs extending or migrating. In these cases the cloud offers expansion of local facilities and a gradual route to full migration. The cloud element often takes the form of offsite failover systems or elastic storage to accommodate short to medium changes in demand. RDTK has been investigating ways in which this extra storage capacity might be utilised at University of Hertfordshire.

As I have blogged previously, our existing networked storage is underused by researchers. The key factors in this are the perceived lack of capacity and difficulty of sharing data with external collaborators. In most cases both these issues can be resolved with extra provision, beyond that which is usually allocated to research staff, but the way we do this is currently ad-hoc. RDTK is looking at consolidating the process of this extra allocation in order to ease the barriers to uptake. In addition, we have looked at alternatives to the way we provide access to storage, including those we know our researchers are using, or would like to use.

In partnership with with HRC3 we set up simple file storage, backup, database and Microsoft SharePoint facilities. These services are ‘in the cloud’ in the sense that the are off site (actually in Iceland) but they represent simple functional services that have been available from Internet Service Providers since before the Cloud coalesced.  The rest of this article focuses on the use of simple file space, though we found the same conclusions can be applied in general to SharePoint or database provision.

Cloud attached files systems

Figures 1, 2, and 3. below show three connections to our test cloud storage hosted at the Thor datacentre in Iceland (Thor is operated by Advania in partnership with HRC3). We compared this with our in house facilities, which are also available off-campus, via a virtual private network. To ease the discussion I will refer to the two parts of our newly formed hybrid cloud as CommerceCloud and UHStaffCloud.

Figure 1 – cloud CIFS volume mapped to a drive on Windows7 desktop
Figure 2 – cloud webdav volume mounted on a Mac OSX desktop

In Figures 1 and 2 CommerceCloud storage is attached to Windows7 and Mac OSX respectively.  This was achieved in the usual way of mounting remote volumes on these platforms, and was easy to do. CommerceCloud was distinguishable from UHStaffCloud only by performance, but this performance gap was considerable when the client was inside the local area network (LAN).

We tested CommerceCloud with both CIFS/samba and Webdav/http protocols and found the latter performed much the better, particularly for OSX, but both were significantly slower than the local UHStaffCloud. This is not unexpected and is due to the delay, known as latency, introduced in moving data over long distances on a wide area network. CommerceCloud was 10 to 20 times slower than UHStaffCloud when working from my desk on our LAN. Again, this is unsurprising, given that we can move data around at 100 Mbit/s on the LAN.

The situation is different when working at home or on some other remote public network. In this scenario, both CommerceCloud and UHStaffCloud are removed from our third party location – both suffer latency, though to different extents, Latency continues to influence performance but bandwidth is the dominant factor.

Using a domestic cable connection with an effective download speed of about 20 Mbit/s, UHStaffCloud was faster than CommerceCloud by a factor of only 3 or 4. When uploading files, the two services were comparable, because the effective speed of the domestic connection was constrained to about 2-4 Mbits/s, well within the capacity of both target networks.

This suggests that when a typical domestic or public broadband connection of  4/0.5 Mbits/s (download /upload) is used, the performance of CommerceCloud and UHStaffCloud becomes equitable.

In the special case where collaborators are working at different points on the JANET network, our own storage remains superior due to its advantageous connection to JANET, but we expect CommerceCloud, though inferior, to be acceptable.

Figure 3 – cloud storage via secure FTP client application

Figure 3 shows an alternative way of working. In this case, we used FTP via Filezilla, which was the fastest FTP client we tested on both Windows and OSX. From within the UH network we consistently saw >25Mbit/s to and from CommerceCloud (with a peak > 55Mbit/s).  This was often nearly as fast as UHStaffCloud, and never slower than by a factor of 2.  An equivalent comparison for use off-campus is hard to make because both UHStaffCloud and CommerceCloud were significantly limited by the available bandwidth, but we expect them to be comparable.

For people who can accept the use of a client application rather than desktop integration, or better still, are confident with the command line, FTP remains the best way to share files.

One factor to note is that we used files of between 2MB and 120 MB for these tests. These are perhaps larger than the files most people will be sharing over a network.  The reason we didn’t use small files was that the delay to the start and end of transfers introduced by handshaking in the filesystem or application (not latency) was significant, and would have made comparison difficult.

To some extent this work evidences what was already known: the latency of almost any remote connection makes it compare poorly with a local area network. However, the ideal situation of two people sharing data on the same LAN or even on a high speed WAN such as JANET is not the norm.

So to conclude..

Files in the cloud offer opportunity to expand existing provision at University of Hertfordshire. When working in a collaboration at home, abroad, with colleagues at other institutions, the overall bandwidth available moderates the effect of connection latency, and in many cases the storage system can respond as fast as it can be accessed, regardless of its location.

The elephants in the room..

Dropbox, Microsoft Skydrive, GoogleDrive. We know these consumer level products are popular with a lot of researchers because they are just so easy. The work above underlines one of the reasons why these applications are so effective: latency would still be an issue here too, but their desktop versions avoid it by using asynchronous background transfer. When you save, close or drop a file onto the desktop folder associated with these applications, they synchronise, moving data to and from the cloud whilst you get on with something else. They are slow slow slow, but you don’t often notice and they can also use this ‘background time’ to do other good things like encryption and chunking, which allows only the part of files that have changed to be transferred.

So why even consider the old, ‘pre-Cloud’ technologies that we have investigated above?  Because the terms of use of  Dropbox et al. remain problematic or unacceptable for some RDM scenarios, (less scenarios than most policies would allow, but more than most researchers would consider).  Brian Kelly and Joss Winn’s comments on Orbital’s very useful article about ownCloud, begin the case against Dropbox nicely, I don’t intend to follow the trail of those arguments here.

One advantage of the methods we looked at above is that they sit relatively well with storage and authentication systems currently found in Higher Education. When combined with smoother processes for setting up users, they offer a path of low resistence to improved services whilst staying within the reach of our governance. This is why they remain important.

Until offerings such as ownCloud evolve into a scalable and robust ‘Academic Dropbox’, the old protocols used with cloud storage will still be useful.

  4 Responses to “Files in The Cloud”

  1. A useful post. Thank you, Bill. I think that your first paragraph points to where Lincoln is slightly different at the moment, in that our ICT infrastructure was virtualised five years ago (VMWare) and is now in need of refresh. There’s a new ICT Strategy just going through the committee process that points to a strong adoption of cloud services over the coming years with that work starting quite soon: IaaS, PaaS, etc. The Janet Brokerage and G-Cloud are likely to be where we look for those services. Whether ownCloud is suitable when going down this route is still unknown, but our investigation into it that application will inform ICT’s eventual choice. We know that individual members of staff across the university, including in ICT, are using Dropbox for its convenience, so it is taking no persuasion from Orbital that this is the direction of travel for online storage. The old protocols are certainly useful, but if the applications they permit inhibit access, collaboration and sharing, then we’re not moving forward in the production of knowledge.

    p.s. I would hesitate to refer to Dropbox, Google Drive, etc. as a ‘consumer level product’. They are widely used in business and built on what is now the standard for any ‘enterprise’ e.g. AWS, Google’s data centres. Did you see the recent pictures of Google’s data centres? Surely the envy of any enterprise and with an application Such as GDrive tested by millions of users, I don’t think ‘consumer level product’ does ther work justice.

  2. Hi Joss, thanks for this useful comment.

    but if the applications they permit inhibit access, collaboration and sharing, then we’re not moving forward in the production of knowledge – I couldn’t agree more, existing practice and systems do produce inertia. But I feel I have to balance evangelism with pragmatic incremental change in order to move University of Hertfordshire along, which is why we have done this work.

    About ‘ownCloud’? I wasn’t clear. Does it use a local working folder and sync in the background like Dropbox, or is it a ‘live’ interface to a remote volume (suggested by webdav)? This might make a big difference to its efficacy using offsite storage.

  3. Joss, I also forgot to say: I think my use of ‘consumer level’ was a clumsy attempt at describing provenance rather than a reflection of quality. Also, it could be argued that ‘consumer’ products offer a better quality of service than a lot of the systems we offer in HE.

  4. Hi Bill, ownCloud (which you understand I’m not an evangelist for! – it still feels like a ‘beta’ product) does Dropbox-like syncing from a designated local folder to a remote folder. See this for more info.

    It also has a native webdav interface so that you can connect to a remote folder. The latest version also has an official extention to allow Dropbox, Google Drive, AWS S3, OpenStack Swift and FTP connections, too. I’ve tested them with varying success. I’m going to contact ownCloud in the next few days to talk about their Enterprise offering, where I assume all the issues I’m running into are ironed out. If we do adopt it, it’s likely to be as-a-service.