Post Update History:
- July 13th - Original post
- July 14th - More results from cc1.4xlarge single-disk & initial results from c1.xlarge instance type, uploaded new version of the raw data spreadsheet to Google Docs. Updated all graphs.
- July 19th - Lots more data (including ephemeral storage) added to the raw data spreadsheet on Google Docs
- July 20th - Added a new blog post specifically talking about local, ephemeral and EBS performance on cc1.4xlarge instances
Now that Amazon Web Services has opened their new "Compute Cluster" cc1.4xlarge instance types to the public we've spent the day running bonnie++ disk performance benchmarks against single and RAID0 striped EBS volumes.
This is because we are life science types who do lots of high performance computing and cluster building. The single biggest performance bottleneck for people who want to do biology "in the cloud" is the generally poor performance of disk IO and storage in general. We tend to be more bottlenecked by the speed of disk than the speed of CPU in many common informatics and genomics applications.
Following in the footsteps of many others before us (example 1, example 2) we have learned that we can tease additional performance out of Amazon EBS disks by striping together multiple drives into a software RAID0 set.
There is a whole body of experimentation going on right now trying to find the "optimal" combination of:
- EC2 instance type
- EBS volume size
- # of EBS volumes
- Which filesystem to put on the software RAID set
- What software RAID settings to use when creating the RAID set
- What linux IO scheduler to use
- What volume mount options to use
- What other tweaks/parameters for increasing performance
Nobody has really discovered the "ultimate'" solution and things are further complicated by the fact that performance "on the cloud" can vary minute by minute, hour by hour and day by day. It's extremely difficult to get any sort of reliably repeatable data from cloud systems.
I also hate benchmarking because
- Performance "in the cloud" is insanely variable for reasons that are invisible to mortals
- Nobody is ever satisfied with the results
- It's a lot of work, and even harder to do it reasonably correctly
- Everybody has different needs, demands and requirements
At this point all we really want to see is what the effect of having the new non-blocking 10 Gigabit Ethernet network operating behind the new EC2 "Cluster Compute" instance types does for performance on EBS volumes.
Obviously there are a lot of possibilities for non-oversubscribed 10GbE networking for people used to clusters and compute farms. We are also going to test node-to-node file transfers and even vanilla NFS between systems to see if it is now sensible to actually orchestrate actual Platform LSF, PBS and Grid Engine managed clusters on AWS.
We chose a 160GB disk as our target size and made the following EBS volumes:
- Single 160GB EBS volume
- Four 40GB EBS volumes (to be striped at RAID0 into 160GB disk)
- Eight 20GB EBS volumes (to be striped at RAID0 into 160GB disk)
Basically we wanted to run bonnie++ multiple times against 160GB single-disk, four-disk and eight-disk XFS volumes using different Linux IO schedulers to see what would happen.
Filesystem. We chose XFS as the Linux filesystem to use, based largely on the work of others in this area, the filesystem was created with the standard "mkfs.xfs <device>" command. Nothing special
Software RAID. Except for choosing a "--chunksize=256" option, we did nothing special with the creation of the /dev/md0 RAID0 device. An example command for our 8-disk stripeset would look like "mdadm --create --verbose --level=0 --chunk=256 --raid-devices=8 /dev/md0 /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm"
Volume mounting: XFS volumes were mounted with the following options (noatime,nodiratime,logbufs=8?) example: "mount -t xfs -o noatime,nodiratime,logbufs=8 /dev/md0 /eightdisk?"
Linux blockdev ra attribute. Following in the footsteps of others who seemed to find that the value of the Linux "readahead" value was perhaps set too conservatively (especially on RedHat varients) we increased the readahead value for every disk via the command "blockdev --setra 65536? <device>"
Linux IO Scheduler. Various people have reported that the Linux IO scheduler matters. In our tests we wanted to see performance under different schedulers. We tested against the "noop", "deadline" and "cfq" schedulers by altering the contents of "/sys/block/<device>/queue/scheduler".
Bonnie++ Command: We ran the same bonnie++ command for each test, differing only in the name of the output log file. An example command: "bonnie++ -u nobody -n cc1-eightdisk-noop -s 50000 -x 2 -d /eightdisk/ -q -f 2>&1 | tee /opt/results/eightdisk-noop-results.log?"
Access Our Raw Data
Data collected so far has been posted to a Google Docs spreadsheet.
"Cluster compute" cc1.4large Results
We've created new blog posts to specifically talk about what we see just on the new EC2 instance types
- Performance of local storage only on cc1.4xlarge instance types
- Combined performance of local & EBS attached storage on cc1.4xlarge instance types
Incomplete so far -
- Something seems "off" with our single-node 160GB EBS volume test. We might blow the volume and instance away and re-run just to see if we get any major shift in the numbers
- Testing the single-node 160GB EBS volume is so slow that we were only able to complete the single drive tests with the noop IO scheduler. For the 4-drive and 8-drive stripe sets we were able to test with noop, cfq and deadline
- We also have no data yet from non compute-cluster node instance types. We plan on collecting that data over the coming days so we can compare it.
Along with the raw data, here are some graphs. We averaged the values of the repeated tests.
Interpretation and more results will be forthcoming, we'll update this blog post as we learn and do more.
Sequential Create & Delete
Random Create & Delete
Filed Under: Employee Posts