January 27, 2011
Older: Year In Review
Newer: Give Yourself Constraints
Data Modeling in Performant Systems
I have been working on Words With Friends, a high traffic app, for over six months. Talk about trial by fire. I never knew what scale was. Suffice to say that I have learned a lot.
Keeping an application performant is all about finding bottlenecks and fixing them. The problem is each bottleneck you fix leads to more usage and a new bottleneck. It is a constant game of cat and mouse. Sometimes you are the cat and sometimes, well, you are not.
Most of the time, the removal of those bottlenecks is about moving hot data to places that can serve it faster. Disks are slow, memory is fast, enter more memcached.
Over time, you work and work to move hot data into memory and simplify your data access to fit into memory. Key here, value there. Eventually, you get to a place where you have simplified how you access your data into simple key/value lookups.
Games get marshaled into a key named "Game:#{id}"
. Joins are simplified to selecting ids and caching the array of ids into a key such as "User:#{id}:active_game_ids"
or "User:#{id}:over_game_ids"
. In turn, those arrays are turned into objects by un-marshaling the contents of "Game:#{id}"
, etc.
Your data model morphs from highly relational to key/value because key/value is fast and memcached can withstand a bruising.
Do it once, and you know how to do it in the future. The problem is by the time you get to this data model, it is kind of bolted on/in to your app.
What if you could design it this way from the beginning? What if you had no option but to think through your data model in keys and values? Need your data in two different ways? Put it in two different places, etc, etc.
I have good news. Now you can.
A Little History
Not long into my tenure with WWF, we were hitting a lot of walls and there was a lot of talk about NoSQL. Mongo? Membase? Cassandra? Riak?
Which one will work best for the problem at hand? What if we could try them all really easily by just changing which place the data went to? What if we could try out more than one at once?
I sat down one weekend and started thinking about the app and realized what I just talked about above. Along the way, our data access changed from relational to key lookups. This made me think about a hash.
Hashes are so versatile, and yet, so constrained. Hashes are for reading, writing and deleting keys, just like key/value stores. I did a bit of GitHub searching and stumbled across moneta, by Yehuda Katz.
Moneta immediately struck me as brilliant. I was shocked there was no activity around it. If you only allow yourself to read, write and delete with the same API, you can make nearly any data store talk the correct language.
I fiddled with it and forked it, but in the end, it was not quite what I was looking for. I liken it to my first house. I like the house, but having lived in it for six years, I know exactly what I want out of my next house.
The folks at Newtoy (now Zynga with Friends) had mentioned that they wanted to build their own object mapper and name it ToyStore—such a great name.
In a fit of inspiration over the 4th of July weekend, I cranked out attributes and initialization, relying heavily on ActiveModel. It was really fun. I emailed the crew when the next work day came around and they were stoked.
It began to occupy some of my work-related time and Geoffrey Dagley started helping me with it. Over the next few weeks, Geof and I hammered out validations, serialization, callbacks, dirty tracking, and much more.
Everything was built on the premise that the only acceptable methods that could be used to read, write and delete data were read, write and delete.
Adapter: The Common Interface
Over time Brandon Keepers got involved and ToyStore started looking pretty legit. We switched from using Moneta as the base to something I whipped together in a few hours, Adapter.
Defining an adapter is as simple as telling it how the client reads, writes and deletes data. You also have to define a clear method for convenience and to stick close the Ruby hash API.
The client can be anything that you want to have a unified interface. For example, this is how you would create an adapter to store things in a ruby hash.
Adapter.define(:memory) do
def read(key)
decode(client[key_for(key)])
end
def write(key, value)
client[key_for(key)] = encode(value)
end
def delete(key)
client.delete(key_for(key))
end
def clear
client.clear
end
end
key_for
ensures that most things can work as a key. encode
and decode
allow one to hook some kind of serialization in, whatever you fancy, be it Marshal, JSON, or whatever you can imagine.
By defining those methods, we can now get an instance of this adapter and connect it to a client. In the example above, the client is just a plain ruby hash, but in other adapters, it could be an instance of Redis (adapter), Memcached (adapter), or maybe a Riak bucket (adapter).
adapter = Adapter[:memory].new({}) # sets {} to client
adapter.write('foo', 'bar')
adapter.read('foo') # 'bar'
adapter.delete('foo')
adapter.fetch('foo', 'bar') # returns bar and sets foo to bar
# [] and []= are aliased to read and write
adapter['foo'] = 'bar'
adapter['foo'] # 'bar'
Adapters can also be defined using a block (like above), a module, or both (module included first, then block so you can override module with block).
Adapters can also define atomic locking mechanisms, see the memcached and redis adapters for their locking implementations. The more opaque the object, the more you need to lock. Or, in the case of riak, the adapter can handle read conflicts.
ToyStore: The Mapper Fixings on top of Adaper
Once you have secured how your data layer speaks the adapter interface you can use the real power, ToyStore.
Lets say you want to store your users in redis. Create your class, include the Toy::Store, and set it to store in redis.
require 'toystore'
require 'adapter/redis'
class User
include Toy::Store
store :redis, Redis.new
attribute :email, String
end
From there, you can go to town, defining attributes, validations, callbacks and more.
class User
include Toy::Store
store :redis, Redis.new
attribute :email, String
validates_presence_of :email
before_save :lower_case_email
private
def lower_case_email
self.email = email.downcase if email
end
end
user = User.new
pp user.valid?
user.email = 'John'
pp user.save
pp user
pp User.get(user.id)
user.destroy
pp User.get(user.id)
Change your mind? Decide that you do not want to use Redis? Fancy Riak? Simply change the store to use the riak adapter and you are rolling.
require 'toystore'
require 'adapter/riak'
class User
include Toy::Store
store :riak, Riak::Client.new['users']
attribute :email, String
end
Boom. You just completely changed your data store in a couple lines of code. Practical? Yes and no. Cool? Heck yeah.
What all does Toy::Store come with out of the box? So glad you asked.
- Attributes – attribute :name, String (or some other type) Can be virtual which works just like attr_accessor but all the power of dirty tracking, serialization, etc. Also, can be abbreviated which means :first_name could be the method you use, but in the data store the attribute is :fn. Save those bytes! Allows for default values and defaults can be procs.
- Typecasting – Same type system as MongoMapper. One day they will share the exact same type system in its own gem, for now duplicated.
- Callbacks – all the usual suspects.
- Dirty Tracking – save, create, update, destroy
- Mass assignment security – attr_accessible and attr_protected
- Proper cloning
- Lists – arrays of ids. If user has many games, user would have list :games which stores in game_ids key on user and works just like an association.
- Embedded Lists – array of hashes. More consistent than MongoMapper, which will soon reap the benefits of the work on Toy Store embedded lists.
- References – think belongs_to by a different (better?) name. Post model could reference :creator, User to add creator_id key and relate creator to post.
- Identity Map – On by default. Should be thread-safe.
- Read/write through caching – If you specify a cache adapter (say memcached), ToyStore will write to memcached first and read from memcached first, populating the cache if it was not present.
- Indexing – Need to do lookups by email? index :email and whenever a user is saved the user data is written to one key and the email is written as another key with a value of the user id.
- Logging
- Serialization (XML and JSON)
- Validations
- Primary key factories
It pretty much has you covered. Adapters for redis, memcached, riak, and cassandra already exist. Expect a Mongo one soon. Have to make a few tweaks to adapter. Yep, even Mongo.
What are other adapters that could be created? Membase? Just start with the memcached adapter and override key_for
. Git? File system? REST? MySQL?! I love it!
The Future
The future is not picking a database and forcing all your data into it. The future (heck, now even) is the right database for the job and your application may need several of them.
All this said, in no way do I think ToyStore is going to take the world by storm. It is a different way to build applications. This way comes with great power, but great confusion as well.
Currently, each model is serialized into one key in the store, based on how the adapter does encode/decode. Eventually, I would like to add the ability to store different attributes in different keys. For example, maybe you want active_game_ids to be stored in a key by itself so you don’t have to constantly save the entire user object.
I can also see a use for being able to store an attribute not just a different key, but a different store entirely. Store your user objects in Riak, but active_game_ids in a Redis set. This is where it would get really powerful.
At any rate, I am very excited about this project and I think it has a lot of potential. I would also like to add that MongoMapper is here to stay.
In fact, I learned from my mistakes on MongoMapper when building ToyStore and will be back-porting those learned experiences very soon. Expect a flurry of activity over the next little while.
Closing Thanks
Huge thanks to Newtoy (now Zynga with Friends) for allowing Geof and I to open source this. Several pieces of ToyStore were built on their dime and I really appreciate their contribution to the Ruby and Rails community!
As is typical with new projects, there are probably rough spots and good luck finding documentation. I have included a bevy of examples and the tests do a superb job at explaining the functionality of each method/feature.
Let me know what your thoughts are and be sure to kick the tires!
22 Comments
Jan 27, 2011
This is good.
Jan 27, 2011
Nice work John, I might have to give this a go here shortly. I’m really surprised you didn’t build out Mongo first!
Do you see mongo support being built out on top of MongoMapper or directly on the Ruby driver itself?
Jan 27, 2011
@Jesse Newland: Yes, yes it is.
@Ray Krueger: Mongo support is coming soon. I do not make all the operations decisions so Mongo has not been involved to date. That said, adapter does not map perfectly to Mongo yet as it is far more than a key/value system. It will be built on top of the Ruby driver. Honestly, there is a good chance that parts of MongoMapper may one day be powered by ToyStore… :) MM may become just a very specific version of ToyStore for Mongo with support for dynamic querying.
Jan 27, 2011
Hey John! Nice post. I think the sentence “MongoMapper is going nowhere” should be changed to “MongoMapper is not going anywhere.” Was confusing for a second.
Very interested in the ability to split off some attributes to a different store — I look forward to hearing more about your efforts!
Jan 27, 2011
@mrb: Thanks. Changed it to MongoMapper is not going away. Hope that clears up the confusion. I <3 MM. :)
Jan 27, 2011
This is really sweet… I’ve hodge podged this process many of times, nice to see something out there that can just save time on the BS I normally have to do.
Thanks!!!
Jan 27, 2011
Great stuff, John. I really like the read/write-through caching and explicit index creation. Other potential avenues that came to mind:
The multi-get stuff could be accomplished in an individual adapter and maybe that’s best, given that some stores won’t support it.
Awesome project overall.
Jan 27, 2011
<3
Jan 27, 2011
I like the concept, but it’s clearly going to take some work to get my head around it.
The first thing that strikes me is that there’s no native way to retrieve more then one object. Presumably that’s by design, but a bit of guidance on how to handle that would be very helpful.
Jan 27, 2011
I like the idea of a simple common adapter, but you lose some of the unique features like get multiple that a lot of NoSQL systems support.
Making multi get/set/delete the default might actually be a good way to go. Then adapters could just implement them serially.
ie:
But then allow the adapter to implement a smarter version of read.
Jan 27, 2011
Can’t wait to use this for some Polyglot goodness in my app…
Jan 27, 2011
Aw, snap! I wish I had seen this before I went and forked moneta
Jan 28, 2011
Good work! Soooo going to be checking this out. I already got a project in mind for it.
Jan 28, 2011
Small typo:
Great stuff!
Jan 28, 2011
@Vijay Dev: Thanks! Fixed.
Jan 28, 2011
This is revolutionary! I can’t wait to use this in my next project!
Haven’t looked at the code yet, so I don’t know if this is possible or practical, but reading/writing with multiple stores for parallelism/redundancy or with distributed systems (one store for performant data, one store for reporting) would be very useful. I might give that a fork :)
Jan 29, 2011
I just started working on an ORM for Membase. Maybe I’ll just write an adapter and use toystore instead.
Jan 31, 2011
I was just trying to think of the minimum feature that I would need to play with toystore; I just know that I would end up throwing my toys out of my pram without it;
Ordering/Sorting; insertion order is retained on calls like get_index which is cool;
But me thinks I would need some order functionality
get_index(:state, “active”).order(:published_at.desc)
so there would need to be scoring? of some kind; but then multi server issues; crazyness;
Jan 31, 2011
@hookercookerman: That kind of stuff would probably be more adapter specific. Feel free to build on top, but I do not see stuff like that in core.
Feb 21, 2011
Cool stuff!
small typo: “If you specific a cache adapter…”
Should be “specify”
May 08, 2011
Great stuff. Starting a new project and evaluating this vs. Ohm. Our target is primarily Redis but I could see Mongo in the future and like that this is ActiveModel based.
However, echoing some of the above comments, it seems deficient in not only multi-get and transaction support but also is treating all of the stores as lowest common denominator key-value only, which makes Redis little more than disk-backed memcached.
Redis is in fact much more funcitonal, for example storing objects as hashes and using set diff/union ops on ids for joins/collections. Are you serializing the entire object on every write, even if only one attribute changes? The simplicity of Adapter is pretty, but seems like it needs a few more primitives for the persistence layer to make best use of the stores.
May 08, 2011
@tribalvibes: Unless you want to write that stuff yourself, I would go with Ohm. Toystore is really bare bones right now. I am just now slowly starting to work on ways to make it possible to take advantage of the underlying store.
Most likely I will start making more optional adapter methods, like get_multi, insert/update, etc. and then have Toystore use those if available.
I am well aware that redis is much more than just a key/value store. One note, you can currently just drop to Foo.store.client at any time and you get access to the redis connection so you can do whatever you want. Currently, I just define the methods that I need.
Sorry, comments are closed for this article to ease the burden of pruning spam.