The cloud is not always the answer

23 Feb 2011 The cloud is not always the answer

BucketExplorer In Action …

This is a short rant caused by my experience today in trying to do some simple internet-based data movement. After wasting hours trying to use a cloud solution I “fell back” to just SCP’ing the files to a FTP server host.

Executive summary: I wasted more than three hours trying unsuccessfully to get BucketExplorer and CloudBerry S3 Explorer Pro to move a pair of 5GB files into one of my Amazon S3 buckets when a simple SCP copy to a FTP server took a little over an hour.

Time estimates:

  1. Wasting time diddling with cloud data movement: 3.5 hours
  2. SCP’ing the data to an FTP server: 80 minutes

This all started because I need to pass some exported VM server images to someone. We had two different files, one representing a native XenServer Appliance (.xva) format and a second file converted into the standard Open Virtualization Format (.ovf) format.

Sounds simple right? For something like this I normally just toss the data into a secured Amazon S3 bucket and depending on sensitivity & who I need to get the data to we either swap AWS credentials or I use the neat S3 service to generate one-off HTTP URLs that expire within a few hours.

However, a pair of 5GB files is a bit larger than what I normally throw at my S3Fox browser plugin so I decided to use some more heavyweight cloud data transfer tools — seemed logical to me as I had been using those tools in the past to sling around 350GB human genome sets without breaking a sweat.

Sadly, neither “cloud data movement” method worked.

Two separate attempts both failed at the last possible step!

BucketExplorer

Things started off promisingly. I’ve used BucketExplorer in the past to repeatedly toss human genome data into the cloud 350GB at a time. Moving a pair of 5GB files seemed like it would be pretty trivial to do. BucketExplorer is normally fantastic – low overhead, great status output and ‘smart’ behavior when it comes to MD5 checksums, copies and file comparisons.

No such luck. BucketExplorer did a great job of splitting my file into pieces and using the slick new Amazon multipart-http upload process to parallelize the data transfer. Speed was great and things looked good.

Good logs and history allow the specific error to be pulled out

Until the final step. This is where BucketExpolorer completes the file upload and “merges” the chunks back into a complete file before finally moving it into the destination path.

In two separate attempts the 161 out of 162 file chunks made it into S3. Both times the final transfer failed with an error stating “Unable to move the uploaded object from workarea to destination path.”

CloudBerry S3 Explorer Pro

CloudBerry Lab (www.cloudberrylab.com) is also repeatedly mentioned as among the best makers of S3 data handling tools. Like the bucketexplorer team they do a fantastic job tracking S3 improvements and making sure their product is continuously upgraded to take advantage of the latest S3 features

The GUI is beautiful and the copy process straightforward..

CloudBerry S3 Explorer Pro (possibly) In Action …

I started the copy, saw the queues show up.

And waited. And waited. And waited some more.

Finally some little text popped up at the bottom of the GUI mentioning copy progress and transfer speeds. However the text rarely changed and checking stats on my notebook showed almost zero disk activity and little network traffic.

The CloudBerry GUI is still up and running as I write this post, I just have no idea if it is actually moving data or not.

I’m not trashing CloudBerry Explorer here, I’ve heard so many great things about the tool (including very positive mentions from the internal S3 team at Amazon) that I’m convinced that something was wrong with my setup and/or environment. Might have been that my Windows XP host was running virtually on OS X under Parallels or that I started off first trying to pull the source data via a networked CIFS share rather than on the local OS disk. Who knows.

CloudBerry is going to play an important role in some additional upcoming cloud data movement experiments that we have planned so stay tuned for follow-up posts where we do things right …

Keep It Simple, Nerd.

I gave up trying to be cloud-clever at this point. I have access and admin control over an internet-facing FTP server so I broke down and started a command line SCP copy.

Result? SCP file transfers at 1MB/sec and each file copied to the FTP server host in about an hour. No mess, no errors and no hassles at all.

Lesson Learned

The moral of the story is that clever is not always smart. I should have just copied the data via SCP from the start. The final screenshot shows a simple SCP operation completing in an hour and 20 minutes without error.

4 Comments
  • Michael Little
    Posted at 17:33h, 21 March Reply

    “To the Cloud! Alice!”

  • Kri007
    Posted at 00:50h, 26 May Reply

    Hello friend,
    chrisdag its really great article and i appreciate your work with amazon,I think you are using Bucket Explorer version which support multi part but it is not supporting upload for more than 5 GB upload as Amazon doesn’t provide copy operation on more then 5 GB object…I think they are providing this feature in next version it will surely work for you….

  • Bucket Explorer
    Posted at 01:11h, 08 August Reply

    Nice Article..
    Yes Earlier Bucket Explorer not support multipart upload for file greater then 5 GB but not we are providing this feature in our latest version. You can perform multipart upload for file 5 GB and greater then that as well now.
    http://www.bucketexplorer.com/documentation/amazon-s3–how-to-upload-big-file-in-parts.html

  • Kevin
    Posted at 21:06h, 08 December Reply

    Hmmm you should check this post by brad. http://bcbio.wordpress.com/2011/04/10/parallel-upload-to-amazon-s3-with-python-boto-and-multiprocessing/

    gonna try it soon. if u do blog about it and let me know if it works better than plain scp? I guess it should

Post A Comment