The hybrid cloud approach is being adopted in many organisations, particularly where high quality local infrastructure is in place and is likely to remain useful for a period of years but needs extending or migrating. In these cases the cloud offers expansion of local facilities and a gradual route to full migration. The cloud element often takes the form of offsite failover systems or elastic storage to accommodate short to medium changes in demand. RDTK has been investigating ways in which this extra storage capacity might be utilised at University of Hertfordshire.
As I have blogged previously, our existing networked storage is underused by researchers. The key factors in this are the perceived lack of capacity and difficulty of sharing data with external collaborators. In most cases both these issues can be resolved with extra provision, beyond that which is usually allocated to research staff, but the way we do this is currently ad-hoc. RDTK is looking at consolidating the process of this extra allocation in order to ease the barriers to uptake. In addition, we have looked at alternatives to the way we provide access to storage, including those we know our researchers are using, or would like to use.
In partnership with with HRC3 we set up simple file storage, backup, database and Microsoft SharePoint facilities. These services are ‘in the cloud’ in the sense that the are off site (actually in Iceland) but they represent simple functional services that have been available from Internet Service Providers since before the Cloud coalesced. The rest of this article focuses on the use of simple file space, though we found the same conclusions can be applied in general to SharePoint or database provision.
Cloud attached files systems
Figures 1, 2, and 3. below show three connections to our test cloud storage hosted at the Thor datacentre in Iceland (Thor is operated by Advania in partnership with HRC3). We compared this with our in house facilities, which are also available off-campus, via a virtual private network. To ease the discussion I will refer to the two parts of our newly formed hybrid cloud as CommerceCloud and UHStaffCloud.
In Figures 1 and 2 CommerceCloud storage is attached to Windows7 and Mac OSX respectively. This was achieved in the usual way of mounting remote volumes on these platforms, and was easy to do. CommerceCloud was distinguishable from UHStaffCloud only by performance, but this performance gap was considerable when the client was inside the local area network (LAN).
We tested CommerceCloud with both CIFS/samba and Webdav/http protocols and found the latter performed much the better, particularly for OSX, but both were significantly slower than the local UHStaffCloud. This is not unexpected and is due to the delay, known as latency, introduced in moving data over long distances on a wide area network. CommerceCloud was 10 to 20 times slower than UHStaffCloud when working from my desk on our LAN. Again, this is unsurprising, given that we can move data around at 100 Mbit/s on the LAN.
The situation is different when working at home or on some other remote public network. In this scenario, both CommerceCloud and UHStaffCloud are removed from our third party location – both suffer latency, though to different extents, Latency continues to influence performance but bandwidth is the dominant factor.
Using a domestic cable connection with an effective download speed of about 20 Mbit/s, UHStaffCloud was faster than CommerceCloud by a factor of only 3 or 4. When uploading files, the two services were comparable, because the effective speed of the domestic connection was constrained to about 2-4 Mbits/s, well within the capacity of both target networks.
This suggests that when a typical domestic or public broadband connection of 4/0.5 Mbits/s (download /upload) is used, the performance of CommerceCloud and UHStaffCloud becomes equitable.
In the special case where collaborators are working at different points on the JANET network, our own storage remains superior due to its advantageous connection to JANET, but we expect CommerceCloud, though inferior, to be acceptable.
Figure 3 shows an alternative way of working. In this case, we used FTP via Filezilla, which was the fastest FTP client we tested on both Windows and OSX. From within the UH network we consistently saw >25Mbit/s to and from CommerceCloud (with a peak > 55Mbit/s). This was often nearly as fast as UHStaffCloud, and never slower than by a factor of 2. An equivalent comparison for use off-campus is hard to make because both UHStaffCloud and CommerceCloud were significantly limited by the available bandwidth, but we expect them to be comparable.
For people who can accept the use of a client application rather than desktop integration, or better still, are confident with the command line, FTP remains the best way to share files.
One factor to note is that we used files of between 2MB and 120 MB for these tests. These are perhaps larger than the files most people will be sharing over a network. The reason we didn’t use small files was that the delay to the start and end of transfers introduced by handshaking in the filesystem or application (not latency) was significant, and would have made comparison difficult.
To some extent this work evidences what was already known: the latency of almost any remote connection makes it compare poorly with a local area network. However, the ideal situation of two people sharing data on the same LAN or even on a high speed WAN such as JANET is not the norm.
So to conclude..
Files in the cloud offer opportunity to expand existing provision at University of Hertfordshire. When working in a collaboration at home, abroad, with colleagues at other institutions, the overall bandwidth available moderates the effect of connection latency, and in many cases the storage system can respond as fast as it can be accessed, regardless of its location.
The elephants in the room..
Dropbox, Microsoft Skydrive, GoogleDrive. We know these consumer level products are popular with a lot of researchers because they are just so easy. The work above underlines one of the reasons why these applications are so effective: latency would still be an issue here too, but their desktop versions avoid it by using asynchronous background transfer. When you save, close or drop a file onto the desktop folder associated with these applications, they synchronise, moving data to and from the cloud whilst you get on with something else. They are slow slow slow, but you don’t often notice and they can also use this ‘background time’ to do other good things like encryption and chunking, which allows only the part of files that have changed to be transferred.
One advantage of the methods we looked at above is that they sit relatively well with storage and authentication systems currently found in Higher Education. When combined with smoother processes for setting up users, they offer a path of low resistence to improved services whilst staying within the reach of our governance. This is why they remain important.
Until offerings such as ownCloud evolve into a scalable and robust ‘Academic Dropbox’, the old protocols used with cloud storage will still be useful.