Cloud Computing: Increasing Performance SDForum Presentation

The theme for the latest SDForum Cloud Services & SOA SIG meeting was increasing performance of coud computing. The talk included three presentations that went over network communications, application, and storage.  This seemed like it would be an interesting talk and my expectation was that I might learn something useful that a startup or small tech company could use. It was not entirely useful for this purpose.

The first presenter, from Riverbed, had an interesting product but it was really geared toward large globally distributed companies. The second presenter, from Gear6, had the most informative presentation because it first focused on memcached and then he wrapped up with what their product provides on top ofmemcached. If you were not familiar with memcached, it was a wealth of knowledge, if you were familiar with memcached, there were some good tidbits about operating memcached on Amazon EC2 instances. The final presenter, from RainStor, had a product that, again, was geared towards larger companies that need lots of storage for government compliance, lot's of storage, and an efficient way of saving and retrieving the information. Some of the compression ideas were interesting but still not very entirely to a Web 2.0 world.

The following are the notes that I took during the talk:

The theme is improving performance in the cloud. Looking at three different technologies and three different approaches.
Cloud Service SIG
  • 4th Tuesday of every month at VMWare
  • Next topic: Creating your own cloud
  • Dave Nielsen is the co-chair and also organizer of CloudCamp
    • Looking for volunteers to participate
Network Performance - Speaker 1
Commentary - it was basically a marketing pitch (wasted a little too much time on this speaker)
  • Unleashing Cloud Performance
  • Bob Gilbert, Director of Marketing for Riverbed
  • Today's workplace
    • More distance
    • More sites
    • More Data in more places
    • Massively distributed network across branch offices all over the world
  • IT: Centralized vs. Distributed
  • What is your top barrier form Adopting Cloud Computing?
    • Availability of service
    • Data & Vendor lockin
    • SEcurity - data confidentiality and auditability
    • Data transfer bottlenecks
    • Performance unpredictability
  • Root Cause of Poor Cloud Performance
    • Limited bandwidth
    • Chatty Protocols/Apps
    • High Latency
  • How does latency impact performance
    • Network Throughput is proportial to 1/RTT
      • Both at the TCP and Application Layer
    • There is a hyperbolic decay in latency with each roundtrip
    • Doubling RTT = Cutting throughput in half
  • Riverbed
    • Pioneer in WAN optimization (puke)
      • Groundbreaking, gamechanging, obliterate bottlenecks
    • Profitable
    • $394M
    • LAN-like performance for branch offices
    • Client that lives in branch, mobile workstations, 2ndary data center
    • Talks to primary data center
    • And also a cloud version
  • Addressing All of the Root Causes of Poor Performance (RiOS)
    • Streamlines at all of the network levels
    • Store some data locally; when the remote client requests something of the remote appliance, it says, no that is on  your local server, serve from that.
    • Works at the TCP and Application layer
Application Performance - Speaker 2
Felt very fast and there was a lot of fairly good information.
  • Bill Takacs, Director of Product Manager at Gear6
  • Massive increase in traffic and population
  • Forrester 2.2B people on the internet globably by 2013
    • Bill think is will be more
  • This will result in a huge increase in traffic
  • Web Growth
    • We've moved from static to dynamic content
  • In the Web 2.0, latency is a killer
    • Web 2.0 Architecture: "Origin Stress"
  • Steve something from Google has some metrics about latency and Google losing revenue
  • What do you do? You cache
    • Eleviate the stress on the database with a memcache
  • Memcached
    • big hash table
    • by Danga for Live Journal
    • Sig reduction in load on database
    • Perfect for web sites with high database load
    • In use by Facebook, Twitter, MyYearBook, others
      • Twitter has 3TB of cache
    • One issue is relying on it for scaling
  • What Memcached is not
    • A persistent data store
    • A database
    • Application-specific
    • A large object cache
    • Fault-tolerant or highly available
  • Best Practices
    • use w/MySQL
    • Use 64 bit servers
    • Cache "expensive ops"
    • Cache bi-directionally
    • Use consistent hashing
    • Manage connection
    • Deign to gracefully fail
    • Eviction: separate pool
      • Different cache pools for different applications
    • Optimize object sizes
      • 1 MB object size limit
    • Instrument app and cache
  • Memcached in the cloud
    • Best Practices in the Cloud
      • Design for failure
      • Design for the dynamic nature of the cloud
    • What are the issues with Memcached in the cloud?
      • WAN latency will prevent Memcached use in a mix environment (your datacenter + cloud provider)
  • Use Case 2 - Hybrid
    • Load balance in to the cloud
    • Have own data center and then spin up more servers in the cloud as needed based on
  • Summary
    • Think thru architecture
    • Plan to deal with dynamic nature of cloud
    • The least scalable component in your system becomes the bottleneck
  • Gear 6
    • Gear 6 Web Cache - improvements on Memcache
    • Focused on Memcache and also NOSQL
    • Started as an appliance and now releasing a cloud appliance on EC2
      • Pesistance - export the cache to a file and can be used for recovery and warm up
      • Hybrid memory management improvements
    • Advanced Memcached analytics
      • Helps debug issues
    • Fully compliant with Memcached protocol
Storage Performance - Speaker 3
  • How to address big data
  • Ramon Chen, VP Product Management, RainStor
  • RainStor
    • Provides disruptive technology for preserving structured data
    • Tech developed in the UK
    • Partner OEM go-to-market strategy
    • Funding from Storm Ventures and Informatica
  • Big Data Retention Problem
    • Massive data growth
    • Strong business and regulatory drivers
    • Requirements outstripping resources
  • The complexity of managing structured data
    • To date, idustry focus for retention has been on unstructured data
    • Managing and accessing structured data is complete
    • Wide spectrum of data profiles
  • What if you could make the problem much, much smaller
    • Dedup and store unique values and patterns resulting in massive compressions
    • Provide full access to data without re-inflation
    • Have built in immutability and legal compliance
    • Query on demand at a point in time using standard SQL
    • Provide cost efficient storage in the cloud and on premise
  • RainStor Cloud Architecture
    • Compressed data sent to the cloud resulting in quicker and cheaper uploads
    • Encrypted data stored in private multi-tenant containers ensuring security and easy management
    • Data accessed on demand using standard SQL tools leveraging elasticity of the cloud