The First SF Redis Meetup
Although I am not a Redis user, yet, I do have an interest in NOSQL (Not Only SQL) data storage and Redis has come up in readings and discussions as a very slick piece of technology. I figured that the First San Francisco Redis Meetup would be a good opportunity to find out more about Redis and how other software developers and companies have deployed it. Even though I was one of two or three who have not used Redis, this meetup met my expectations.
The event was held at the Engine Yard offices in San Franciso (Engine Yard is a big user of Redis) on March 25, 2010. The evening started with pizza, soda, and a very well done introductory Redis presentation by Ted Nyman, the organizer. Subsequent speakers discussed how they were using Redis for their projects. The main takeaway from the evening was that you can use Redis for pretty much anything but it seems very well suited for any situation where you want to store some information as a data structure on a remote server. It seems like a very strong replacement for memcached and much much more.
For more information on this and future Redis Meetups, check out The San Francisco Redis Meetup Group.
Below are the notes that I took that evening. SF Redis Meetup Group
- Intro by Ted Nyman
- First Redis Meetup (in the world?)
- Salvatore Sanfilippo (antirez) is the author of Redis
- Very attentive maintainer; responsive on email and IRC
- Open sourced about a year ago
- A project called "Rescue" uses it
- What is Redis?
- "A super-fast database," for the layman
- "A fast and powerful key-value store that saves us time, money, and our sanity," for the pointy-haired boss
- Or NOSQL
- "A blazingly fast, in-memory data structures sever, with built-in value types and atomic operation; easy replication; all structured with unique keys," for the cool hackers.
- Redis is simple
- Zero config.
- 30 seconds from download to database interaction.
- Tried it the first time and it worked very well.
- Redis is very fast.
- 100,000 r/w a second on a decent Linux box.
- Real Data structures
- redis values can be strings, lists, sets, sorted sets. With sophisticated, atomic operation. And now: hashes.
- a list can be set up as a queue
- hashes are new an powerful
- Redis is multi-lingual
- Client libs for Ruby, Python, Perl, PHP, Clojure, Java, and many more. Even Haskell.
- The Redis C source code is very readable
- Redis is pretty safe
- And getting safer. It's now as persistent and durable as yuou desire. New Multi-Exec. Replication. And more.
- Redis writes to disk every 5 seconds, which is not very "durable"
- But it depends on what you are doing, i.e. saving somebody's up vote on Reddit.
- Has replication, master-slave
- Append only file - not on by default (makes it a little slower)
- Multi-Exec (not on stable)
- Say you want to do three commands: LPUSH, RPOP, INCR
- Run a command like MULTI(LPUSH, RPOP, INCR) EXEC
- If it crashes in the middle, the transaction doesn't commit
- Redis is improving daily
- What's new? Hashes, Virtual Memory, Multi-Exec
- This is for the 2.0 release
- VM is like OS VM, things not used a lot are put to disk
- Neat.
Andy McCurdy from Whiskey Media, maintainer for redis-py
- What redis-py currently does and Andy's vision for it
- redis-py
- supports
- nearly ever command in git
- pipelining
- useful for batch loading
- multi-exec
- so it becomes trivial to chain things together and exec
- redis is like a data structure store and we use a lot of these data structures today
- in python, we can implement the redis list to look like a normal python list but the storage is remote
- doesn't require knowledge of the commands, just use the data structures as if they were native
- side note
- it's an absolute bitch setting it up
- Thrift is a piece of shit
- takes too long to get it setup
- redis is so much more simple
- use django and like django's ORM, for the most part
- supports
- "Who uses redis as a caching server for another database?"
- 3-4 people
- Using sorted sets
- Consistent hashing in the
- Using redis as the main data store
- "Need to roundup the redis client developers and use the same consistent hashing."
- http://www.linkedin.com/in/alejandrocrosa
- http://twitter.com/alejandrocrosa
- http://github.com/acrosa/scala-redis
- Using Redis on production right now
- Why he uses redis.
- The LinkedIn apps use Javascript
- Needed
- Track exceptions that happen on the browser
- Track usage and click through of the apps
- unknown number of insserts
- extremely low latency
- disk access will kill me
- i don't care about losing data
- data extracted composed of sets and lists
- Built a lib that takes the exception and sends it to his server
- Can't look at 347 exceptions (<--- 47, the ultimate "random" number)
- There is a lot of interaction he wants to track
- The thing that drew him to redis was that it was very fast and there are actual data structures behind the scenes
- Looked for a good match of the type of stuff he wanted to do
- Currently collecting data in production
- Want to build a short term memory system to help combat spam and fraud
- Wanted something that you could iterate on quickly
- Redis cluster in the colo
- 10 servers, each running multiple servers
- Every time a new posting comes in, want to track some information about it
- Taking the posting ids and using redis as a queue (via a list)
- Uses a consistent hashing in one of the middle layers so that it can be sharded across several machines
- Data gets replicated to other colos
- Next
- There is an ID in a queue and a worker pulls that off of the queue and then it pulls down the meta data for that posting
- Builds a list in memory of email, network, domain, etc.
- Then put the id in each of these buckets (email, network, domain)
- Uses a sorted set as a time index
- Now can answer the question, given an email (or IP or Network) how many ids were posted over a period of time.
- Also
- Maintain dirty queues
- One for each type (email, ip, net., etc)
- Maintain dirty queues
- Nice thing is that it can be scaled easily by adding more machines.
- Sharding
- The craigslist (set consistent hash module in perl) lib implements it
- Every key that comes through runs through the hash and it spits out the server, port number, etc.
- Needed more memory than
- Audience Tip: compile 32-bit redis and run it on a 64-bit machine and you can save a lot of memory
- SORT command is a little complex but useful
- Can set a TTL on a Redis item but if you write over a key/value; the TTL is also rewritten
- Other surprises with the expiration policy
- Security for phones
- Uses SMS and SMS is aweful to work with
- Architecture
- Cloudpushd
- ANSI C
- Event-driven (libdev)
- Binary protocol listener
- Event-driven redis client
- 2 redis connections
- BLPOP for incoming messages
- Utility connection
- Mobile client
- Maintains persistent TCP connections to cloudpushd
- Cloudpushd instance starts
- BLPOP on unique server id
- Client connects and logs in
- Sends unique id
- Cloudpushd sets client->server
- Future work
- Multiple redis servers (client consistent hash)
- HTTP potocol support
- If you write a daemon for mobile, use 1-byte
- TCP westwood
- Andriod app: http://www.higherirderterms.com/cloudpushdemo.apk
- Cloudpushd
- Wrote Ruby client
- Redis install that has not been restarted in 7.5 months
- Monitoring system
- AMQP
- Uses redis as a state keeper
- Something checks the server, then writes the last check of the monitor to redis
- Have a cloud platform that creates application servers in EC2
- Each server is identical (HA proxy, nginx, Ruby)
- except one that has the elastic ip (master), the others are (slaves)
- slaves keep asking if it is okay
- using redis as a distributed lock server
- so if the master is not okay, can get the lock and spin up a new server
- Each server is identical (HA proxy, nginx, Ruby)
- Using nginx for a proxy for a whole bunch of domains (altered nginx)
- nginx looks up the IP for that domain in Redis
- Use nginx again, tell it to use a specific resolver DNS server (no need to
- use PDNS who talks to an event machine daemon that looks up in redis which ip to send back
- AMQP - it sucks at high availability
- run two rabbit brokers and a redis
- producers sends a message to both rabbit servers
- clients subscribe to both queues
- a lookup is done on redis to dedup
- Thinking about replacing AMQP with redis
- but rabbit gives you other queuing stuff that reis will never have