The First SF Redis Meetup

Although I am not a Redis user, yet, I do have an interest in NOSQL (Not Only SQL) data storage and Redis has come up in readings and discussions as a very slick piece of technology. I figured that the First San Francisco Redis Meetup would be a good opportunity to find out more about Redis and how other software developers and companies have deployed it. Even though I was one of two or three who have not used Redis, this meetup met my expectations.

The event was held at the Engine Yard offices in San Franciso (Engine Yard is a big user of Redis) on March 25, 2010. The evening started with pizza, soda, and a very well done introductory Redis presentation by Ted Nyman, the organizer. Subsequent speakers discussed how they were using Redis for their projects. The main takeaway from the evening was that you can use Redis for pretty much anything but it seems very well suited for any situation where you want to store some information as a data structure on a remote server. It seems like a very strong replacement for memcached and much much more.

For more information on this and future Redis Meetups, check out The San Francisco Redis Meetup Group.

Below are the notes that I took that evening. SF Redis Meetup Group

Intro by Ted Nyman
First Redis Meetup (in the world?)
Salvatore Sanfilippo (antirez) is the author of Redis
- Very attentive maintainer; responsive on email and IRC
Open sourced about a year ago
A project called "Rescue" uses it
What is Redis?
- "A super-fast database," for the layman
- "A fast and powerful key-value store that saves us time, money, and our sanity," for the pointy-haired boss
  - Or NOSQL
- "A blazingly fast, in-memory data structures sever, with built-in value types and atomic operation; easy replication; all structured with unique keys," for the cool hackers.
Redis is simple
- Zero config.
- 30 seconds from download to database interaction.
- Tried it the first time and it worked very well.
Redis is very fast.
- 100,000 r/w a second on a decent Linux box.
Real Data structures
- redis values can be strings, lists, sets, sorted sets. With sophisticated, atomic operation. And now: hashes.
- a list can be set up as a queue
- hashes are new an powerful
Redis is multi-lingual
- Client libs for Ruby, Python, Perl, PHP, Clojure, Java, and many more. Even Haskell.
- The Redis C source code is very readable
Redis is pretty safe
- And getting safer. It's now as persistent and durable as yuou desire. New Multi-Exec. Replication. And more.
- Redis writes to disk every 5 seconds, which is not very "durable"
  - But it depends on what you are doing, i.e. saving somebody's up vote on Reddit.
- Has replication, master-slave
- Append only file - not on by default (makes it a little slower)
- Multi-Exec (not on stable)
  - Say you want to do three commands: LPUSH, RPOP, INCR
  - Run a command like MULTI(LPUSH, RPOP, INCR) EXEC
  - If it crashes in the middle, the transaction doesn't commit
Redis is improving daily
- What's new? Hashes, Virtual Memory, Multi-Exec
- This is for the 2.0 release
- VM is like OS VM, things not used a lot are put to disk
  - Neat.

Speakers

Andy McCurdy from Whiskey Media, maintainer for redis-py

What redis-py currently does and Andy's vision for it
redis-py
- supports
  - nearly ever command in git
  - pipelining
    - useful for batch loading
  - multi-exec
    - so it becomes trivial to chain things together and exec
- redis is like a data structure store and we use a lot of these data structures today
- in python, we can implement the redis list to look like a normal python list but the storage is remote
  - doesn't require knowledge of the commands, just use the data structures as if they were native
- side note
  - it's an absolute bitch setting it up
  - Thrift is a piece of shit
  - takes too long to get it setup
  - redis is so much more simple
- use django and like django's ORM, for the most part
"Who uses redis as a caching server for another database?"
- 3-4 people
- Using sorted sets
Consistent hashing in the
Using redis as the main data store
"Need to roundup the redis client developers and use the same consistent hashing."

Alejandro Crosa, LinkedIn, maintainer for scala-redis

http://www.linkedin.com/in/alejandrocrosa
http://twitter.com/alejandrocrosa
http://github.com/acrosa/scala-redis

Using Redis on production right now
Why he uses redis.
- The LinkedIn apps use Javascript
- Needed
  - Track exceptions that happen on the browser
  - Track usage and click through of the apps
  - unknown number of insserts
  - extremely low latency
  - disk access will kill me
  - i don't care about losing data
  - data extracted composed of sets and lists
- Built a lib that takes the exception and sends it to his server
- Can't look at 347 exceptions (<--- 47, the ultimate "random" number)
- There is a lot of interaction he wants to track

Jeremy Zawodny, Craigslist

The thing that drew him to redis was that it was very fast and there are actual data structures behind the scenes
Looked for a good match of the type of stuff he wanted to do
Currently collecting data in production
Want to build a short term memory system to help combat spam and fraud
Wanted something that you could iterate on quickly
Redis cluster in the colo
- 10 servers, each running multiple servers
- Every time a new posting comes in, want to track some information about it
- Taking the posting ids and using redis as a queue (via a list)
- Uses a consistent hashing in one of the middle layers so that it can be sharded across several machines
- Data gets replicated to other colos
Next
- There is an ID in a queue and a worker pulls that off of the queue and then it pulls down the meta data for that posting
- Builds a list in memory of email, network, domain, etc.
- Then put the id in each of these buckets (email, network, domain)
- Uses a sorted set as a time index
- Now can answer the question, given an email (or IP or Network) how many ids were posted over a period of time.
Also
- Maintain dirty queues
  - One for each type (email, ip, net., etc)
Nice thing is that it can be scaled easily by adding more machines.
Sharding
- The craigslist (set consistent hash module in perl) lib implements it
- Every key that comes through runs through the hash and it spits out the server, port number, etc.
- Needed more memory than
Audience Tip: compile 32-bit redis and run it on a 64-bit machine and you can save a lot of memory
- SORT command is a little complex but useful
Can set a TTL on a Redis item but if you write over a key/value; the TTL is also rewritten
- Other surprises with the expiration policy

Kevin Mahaffey, Lookout

Security for phones
Uses SMS and SMS is aweful to work with
Architecture
- Cloudpushd
  - ANSI C
  - Event-driven (libdev)
  - Binary protocol listener
  - Event-driven redis client
  - 2 redis connections
    - BLPOP for incoming messages
    - Utility connection
  - Mobile client
    - Maintains persistent TCP connections to cloudpushd
  - Cloudpushd instance starts
    - BLPOP on unique server id
  - Client connects and logs in
    - Sends unique id
    - Cloudpushd sets client->server
  - Future work
    - Multiple redis servers (client consistent hash)
    - HTTP potocol support
- If you write a daemon for mobile, use 1-byte
  - TCP westwood
- Andriod app: http://www.higherirderterms.com/cloudpushdemo.apk

Ezra Zygmuntowicz from Engine Yard

Wrote Ruby client
Redis install that has not been restarted in 7.5 months
Monitoring system
- AMQP
- Uses redis as a state keeper
- Something checks the server, then writes the last check of the monitor to redis
Have a cloud platform that creates application servers in EC2
- Each server is identical (HA proxy, nginx, Ruby)
  - except one that has the elastic ip (master), the others are (slaves)
  - slaves keep asking if it is okay
    - using redis as a distributed lock server
    - so if the master is not okay, can get the lock and spin up a new server
Using nginx for a proxy for a whole bunch of domains (altered nginx)
- nginx looks up the IP for that domain in Redis
Use nginx again, tell it to use a specific resolver DNS server (no need to
- use PDNS who talks to an event machine daemon that looks up in redis which ip to send back
AMQP - it sucks at high availability
- run two rabbit brokers and a redis
- producers sends a message to both rabbit servers
- clients subscribe to both queues
- a lookup is done on redis to dedup
Thinking about replacing AMQP with redis
- but rabbit gives you other queuing stuff that reis will never have