Amazon Cloud Training
Fundamentals: Amazon Web Services for Science & Engineering
Training Agenda & Course Information
Intended Audience
Researchers, IT management, systems administrators & software developers looking for a practical & solid understanding of current AWS capabilities and how best to leverage them for research, scientific or engineering uses.
We cover a broad scope of topics of interest to scientists, IT operations staff and software developers. We generally see a diverse attendee mix of scientists, IT professionals and software developers in our sessions.
We’ve had CIOs, software developers, sysadmins and bench scientists all sitting together in the same course, with positive results.
Level of Interaction
Materials are presented in a dynamic lecture format with frequent instructor-led demonstrations, discussions and highlighted examples. Recorded screen-casts may be used on topics that are difficult to orchestrate live. Attendees will have dedicated access to cloud-resident training systems along with example code, scripts and self-paced activity worksheets that can be used for individual exploration and experimentation at any time during the course.
Note: It is important to mention that we have made a conscious decision to favor the inclusion of more content & topics at the expense of interactive lab exercises which can consume significant amounts of class time. We understand, however, that many people like to dive right in and get “hands-on” with the cloud.
In order to deliver the most content while still allowing for hands-on work and interactive exercises we take the following steps in every course that is run:
- Within the first two hours of the course, all attendees have login access to a fully provisioned training server containing all the necessary AWS client, library and API tools and utilities pre-installed. Each attendee also receives dedicated AWS credentials allowing full use of all AWS products and services.
- Self-paced labs and exercises are handed out each day, providing attendees with options for self-exploration and experimentation
- BioTeam always runs this course using two qualified instructors; this lets one instructor answer questions and provide help/mentoring while the other instructor is at the podium
- Both instructors stay late and arrive early in order to support attendees seeking additional help or assistance “outside of class”
Level of Difficulty
Note: This is not a “program the cloud” class requiring software development expertise. We will (rarely) interact directly with any Amazon API.
Familiarity with Linux and shell scripting is expected. Attendees should generally be comfortable using SSH clients and operating from the Linux command line. No programming experience is required but an understanding of software development practices will help when deployment & architecture strategies are discussed. Basic familiarity with Amazon Web Services is expected (see Prerequisites below) to minimize the amount of introductory materials that need to be covered. As a general rule we will use AWS-aware utilities, wrappers, command-line tools and GUIs to show and orchestrate AWS actions rather than directly manipulating the Amazon APIs. Any questions or concerns should be addressed to <chris@bioteam.net>.
Course Content Vs. Current State of AWS
The rate at which Amazon rolls out new products, services and capability upgrades is pretty amazing. We encourage attendees to explore the AWS Blog hosted at http://aws.typepad.com/. Scan through the current articles and previous monthly archive and you will note a pretty long-running trend — Amazon seems to roll out major new changes and enhancements on a monthly basis and has been doing so for years.
We try very hard to stay current with the modern state of AWS; our training materials probably lag Amazon major announcements by about one week or less. We achieve this by doing the following:
- A week before every class, the instructors discuss what materials need to be changed or updated. All course materials, slides and handouts are re-examined the week before a scheduled course. Major or minor course material updates are made as needed. As a rough guess we’d estimate that approximately 10% of our slide content gets modified before every class.
- Class handouts and slides are not printed and bound far in advance and are never stored “on the shelf”. All class handouts are sent to a local printer 24-48 hours before the course begins. This allows instructors to keep working on course updates right up until the final hours before a training session begins.
Schedule & Enrollment
Public classes are offered several times per year in partnership with Cambridge Healthtech Institute, see http://healthtech.com/cloud for details. BioTeam has also published it’s 2012 public training schedule online at http://bioteam.net/2012/01/2012-cloud-training-dates/ but the Healthtech site should be treated as the most authoritative.
BioTeam also offers private training delivered onsite at client facilities with content customized to meet interests and requirements.
Instructors
The class is generally taught by two dedicated instructors.
- Chris Dagdigian <chris@bioteam.net>
- Adam Kraut <kraut@bioteam.net>
Prerequisites
Attendees should have wireless-capable laptops with SSH clients. The Mozilla Firefox web browser is recommended for attendees interested in exploring the various AWS-aware browser plugins & extensions. Attendees will be provided with remote access to Linux systems containing the necessary AWS software, utility, library & resource requirements needed for the course.
Attendees with Mac OS X or Linux systems should have SSH & Java 1.5 (JDK or SDK) installed and available if they want to locally install the AWS command-line utilities on their machines. This is not required.
Attendees may also wish to have personal accounts set up with Amazon Web Services. This is not required for training as credentials belonging to BioTeam will be used for exercises, labs and demonstrations.
Attendees new to Amazon Web Services are encouraged to follow the Amazon self-paced “Getting Started With EC2” tutorial online prior to attending the class. The tutorial can be completed in a short time and presents an excellent introduction to the core EC2 service. Comments or questions can be addressed directly to Chris Dagdigian <chris@bioteam.net>
Day 1 Agenda
Objective: Progress iteratively through the topics essential for building out larger or more production-focused workstreams on the AWS platform. Day One will focus on the basic foundations and will use a realistic use case for building out a more traditional (or ‘legacy’) workflow on Amazon AWS.
I. Intro & Logistics
II. AWS Overview
Goal: There are a huge number of AWS service and product offerings. We’ll cover the ones most of interest to people involved in informatics and high performance computing. All of the currently available AWS offerings will be discussed, with particular attention paid to the services of most use in science & engineering use cases.
III. Mapping Informatics to the Cloud
Goal: Discuss the major environmental, performance and architecture differences between HPC, grid and cluster environments and the AWS cloud environment. Real-world information, “lessons learned” and examples will be used.
IV. AWS: Billing & Credential Management
Goal: Briefly cover the logistics and mechanisms behind organizational billing and credential management. This module will also take a deeper dive into the AWS Identity and Access Management (IAM) service which now enables a much more fine-grained access control, usage monitoring and credential model.
V. AWS: EC2 Overview
Goal: Light introduction to Amazon EC2 to cover definitions & capabilities before we start making heavy use of EC2 instances in live demos and class handouts.
VI. AWS: Configuration Management
Goal: Configuration management of EC2 AMIs is a major component in deploying cloud applications in a reliable, repeatable and easy to manage process. It’s also the first area where cloud novices get bogged down in methods and practices that may increase complexity and administrative burden. For this topic, we will demonstrate several basic methods & techniques before introducing Chef Server (http://www.opscode.com/chef/) as our own preferred method for systems provisioning & configuration management.
VII. AWS: Identity Management
Goal: There are some cases where individual access via SSH keys may not be sufficient (such as with web applications). This module covers various methods of Identity Management on cloud-resident systems.
VIII. AWS: Monitoring & Reporting
Goal: Discuss and demonstrate a number of different monitoring & reporting options. Specific focus on Amazon Cloudwatch (AWS product offering), Server Density (commercial solution from www.serverdensity.com), Hyperic HQ Open Source Edition (open source solution from www.hyperic.com) and SyslogNG (www.balabit.com) for logfile consolidation.
IV. Putting it all together
Goal: Using a real-world use case we will discuss and show several different legacy deployment methods utilizing Amazon Web Services. The “legacy” methods are for supporting existing applications and workstreams that may have been built for HPC clusters and compute farms. Day One will showcase the “legacy” methods while Day Two will showcase a more traditional cloud architecture using current AWS best practices.
V. Wrap-up & Discussion
Goal: Discuss and review the topics of the day with particular focus on identifying attendee interest in areas that were not covered or were not covered enough. Time is being left open in the “Day Two” schedule to handle inclusion of additional topics or demonstrations.
Objective: Continue progressing iteratively through the topics essential for building out larger or more production-focused workstreams on the AWS platform. The focus today will be on architecting solutions using current AWS products and best practices. Due to estimated session size, this training will be lecture, discussion and live demo driven.
I. Intro & Logistics
II. S3 Object Storage Overview
Goal: Coverage of the object-based AWS storage service.
III. EBS Block Storage Overview
Goal: Coverage of the block-based EBS storage service. This module will also cover “shared storage” in the cloud and various methods for making storage bigger, faster & safer.
IV. Data Movement
Goal: Data movement in and out of “the cloud” is problematic for data heavy fields like life science informatics. This module will cover known issues, alternatives such as the Amazon physical ingest/outgest services and where to “draw the line”. Various 3rd party software and services will also be covered.
V. Message Passing Overview
Goal: Review and demonstrate the AWS SQS service, often a central component of cloud-resident workstreams. AWS SNS will also be discussed along with other message passing systems found both on the cloud and in HPC environments.
VI. Additional Topics
Placeholder topic for areas identified during Day One as needing more depth, discussion or demonstrations.
VII. Putting it all together
Review the real-world use case solutions shown in Day One and discuss the pros and cons of those approaches. Continue on with discussion of current-day AWS best practices culminating in a revised/revisited demonstration using more traditional cloud workflow methods.
VIII. Wrap-up & Discussion

