What If A Key Value Store Mated With A Relational Database System?

Last night, the folks from the Grand Rapids ruby group were kind enough to allow me to present on MongoDB. The talk went great. I’ve been excited about Mongo for a couple weeks now, so it was cool to see that it wasn’t just me.

The funny thing is, at nearly the same time, Wynn Netherland presented on MongoDB to the Dallas ruby group. We discovered that he wrote part 1 and I wrote part 2 of the presentation despite not working together on it so we ended up showing each other’s slides as well.

I figured since I spent the time to throw some slides together, I might as well put an intro up here too. First, the slides (they probably won’t mean a lot as they were mostly outlines for me to speak from).

Intro to MongoDB

Ok, so what the crap is Mongo? I find the best way to describe Mongo is the best features of key/values stores, document databases and RDBMS in one. No way, you say. That sounds perfect. Well, Mongo is not perfect, but I think it brings something kind of new to the database table.

Mongo is built for speed. Anything that would slow it down (aka transactions) have been left on the chopping block. Instead of REST, they chose sockets and have written drivers for several languages (of course one for Ruby).

Collections

It is collection/document oriented. Collections are like tables in MySQL (they are even grouped in databases) and serve the purpose of breaking up the top level entities in your application (User, Article, Account, etc.) by type and thus into smaller query sets, to make queries faster.

Documents

Inside of each collection, you store documents. Documents are basically objects that have no schema. The lack of schema may be scary to some, but I look at it this way. You have to know your application schema at the app level, so why put the schema in the database and in your app. Why not just put the schema in your app and have the database store whatever you put in it? This way, you database schema is kind of versioned with your application code. I think that is pretty cool.

Documents are stored in BSON (blog post), which is binary encoded JSON that is built to be more efficient and also to include a few more data types than JSON. This means that if you send Mongo a document that has values of different types, such as String, Integer, Date, Array, Hash, etc., Mongo knows exactly how to deal with those types and actually stores them in the database as that type. This differs from traditional key/value stores, which just give you a key and a string value and leave you to handle serialization yourself.

Object Relationships

There are two ways to relate documents in Mongo. The first, is to simply embed a document into another document. An example of this would be tags embedded in article. Let’s take a look.

{
  title: 'Mongolicious', 
  body: 'I could teach you, but I would have to charge...', 
  tags: ['mongo', 'databases', 'awesome']
}

As you can see, tags are just a key in the article document. The benefits of this are that you never have to do any joins when you show the article and it’s tags as they are all stored in the same place. The other cool thing is that Mongo can index the tags and understand indexing keys that have multiple values (such as arrays and hashes). This means if you index tags, you can find all documents tagged with ‘foo’ and it will be performant. Embedded documents work great for some things, but other things wouldn’t make sense embedded.

Let’s imagine that you have an client document and you want the client to have multiple contacts. If you embedded the contacts for the client with it in a document, it would be inefficient to have a page that listed all the contacts. To have a contact list, you would have to pull out every client and collect all the contacts and then sort them. Also, if a contact should be associated with multiple clients, you would have to duplicate their information for each client.

In SQL, you would have a clients table and a contacts table and then a join model between them so that any contact would be in the system once and could be associated with one or more clients without duplicate. So how would you do this in Mongo? The same way…kind of.

In Mongo, you’d have a client collection and a contact collection. To associate a contact to a client, you just create a db reference to to the contact from the client.

Dynamic Queries

Yep, Mongo has dynamic queries. It actually has a kind of quirky, yet lovable syntax for defining criteria. Below are a few examples from my presentation which are mostly self-explanatory. These are examples of what you would run in Mongo’s JavaScript shell.

# finds all Johns
db.collection.find({‘first_name’: ‘John’})

# finds all documents with first_name 
# starting with J using a regex
db.collection.find({‘first_name’: /^J/}) 

# finds first with _id of 1
db.collection.find_first({‘_id’:1})

# finds possible drinkers (age > 21)
db.collection.find({‘age’: {‘$gt’: 21}})

# searches in embedded document author for 
# author that has first name of John
db.collection.find({‘author.first_name’:‘John’})

# worse case scenario, or if you need "or" 
# queries you can drop down to JavaScript
db.collection.find({$where:‘this.age >= 6 && this.age <= 18’})

You can also sort by one or more keys, limit the number of results, offset a number of results (for pagination), and define which keys you want to select. The other thing that is slick is Mongo supports count and group. Count is the same idea as MySQL’s count. It returns the number of documents that match provided criteria. Group is the same concept, but is accomplished with map/reduce.

To really get a feel for all that you can do with queries, check out Mongo’s advanced query documentation.

Random Awesomeness

Capped collections (blog post): Think memcache. You can set a limit for a collection to a certain number of documents or size of space. When the number or size goes over limit the old document gets pushed out. For more info, see MongoDB and Caching
Upserts: Think find or create in one call. You provide criteria and the document details and Mongo determines if the document exists or not and either inserts or updates it. You can also do special things like incrementers with $inc. For more, read Using mongo for real time analytics
Multikeys: for indexing arrays of keys. Think tagging.
GridFS and auto-sharding: Storing files in the database in a way that doesn’t suck. They have mentioned in IRC that they might even make Apache/Nginx modules that server files straight from GridFS so requests can go straight from web server to Mongo instead of traveling through your app server. For more, read You don’t need a file system

How do I use it with Ruby?

If you have made it this far, you are probably intrigued and are wondering how you can use Mongo with Ruby. There is an official mongo-ruby-driver on GitHub for starters. It supports most of Mongo’s features, if not all, and gets the job done, but it is really low level. It would be like writing an application using the MySQL gem. You can do it, but it won’t be fun. I’ve even started giving back to the driver.

There are two “ORM’s” for Mongo and both are on GitHub. The first is an ActiveRecord adapter and the second is MongoRecord. I took a look at both of these, and decided to write my own. Why?

Mongo is not a RDBMS (like MySQL) so why use RDBMS wrappers (like the AR adapter)?
I think the DSL for modeling your application should teach you Mongo.
Mongo is perfect for the website management system I’m building and I just didn’t like the other wrappers. Why would I want to build something with something that I didn’t like?
It sounded fun!

MongoMapper

I started the Friday of Memorial weekend and was able to crank out most of the functionality. Since then, I’ve been working on it whenever I get time and it is really close to being ready for a first release. That said, it is not public yet. Don’t worry, as soon as it is ready for prime time, I’ll be posting more here. So what features does MongoMapper have built in?

Typecasting
Callbacks (uses ActiveSupport callbacks)
Validations (uses my fork of validatable)
Connection and database can differ per document
Create, update, delete, delete_all, destroy, destroy_all that work just like ActiveRecord
Find with id, multiple ids, :all, :first, :last, etc. Also supports Mongo specific find critieria like $gt, $lt, $in, $nin, etc.
Associations
Drop in Rails compatibility

So out of the features listed above, all are complete but the last two at the time of this post. I’m currently working through associations and then I’m going to start making a Rails app with MongoMapper to figure out what I need for “drop in and forget” Rails compatibility. I have a few other smart people helping me so my guess is that it will be out in the next two weeks.

Let me know with a comment below what you like and don’t like about Mongo. I’m very curious what other Rails developers think after reading this intro and the articles I’ve linked to. I’m stoked, but I’m sure it is not for everyone.

Links

MongoDB on Twitter (I also follow a search)
MongoDB Blog Brand new and some good stuff is showing up.
10Gen Blog Lots of good articles on mongo.
MongoDB Downloads
Google Group

15 Comments

Mike DIrolf
Jun 03, 2009

Great article – good overview of mongodb. I’m excited to play with the mapper when it’s done!
dwight_10gen
Jun 03, 2009

“the best features of key/values stores, document databases and RDBMS in one” — that’s a very good way to explain what it does. Can I quote you on that?
Jonathan Conway
Jun 03, 2009

Can’t wait to take a look at MongoMapper. I was about to start on an adapter for DM tonight, but after reading you’re article I think I’ll hold off to see if MongoMapper suits my needs. I’d be particularly interested to see how you implemented callbacks as MongoRecord currently doesn’t support this.
Lucas Húngaro
Jun 04, 2009

A very good introduction to MongoDB. We are replacing MySQL with MongoDB here at Busk.com ‘cause we simply don’t need many of the features of a RDBMS (schema, transaction, …).

I like the fact that we’re getting diversity at the database layer too. Relational databases are very good for some applications, but not for all.

The feature I like the most about MongoDB is dynamic queries. I’ve worked with CouchDB and the way it does querying is really limited sometimes.

What I don’t like is the lack of a nice administrative web interface like CouchDB’s Futon. But that’s not really necessary, just… cool. :)
John Nunemaker
Jun 04, 2009

@Lucas – I was thinking about making a web interface using sinatra or something. Just something really simple that you could fire up ad hoc when you want to check stuff out. Have to finish MongoMapper first. :)
Jim Mulholland
Jun 04, 2009

Excellent overview, John!

I can vouch for MongoDB. We have been using it since March and have loved it so far. What we are doing would be much more complicated with MySQL. And if you are developing Rails applications, you have to love the fact that migrations are a thing of the past with Mongo!

I learned something new with the “upsert” functionality in Mongo. I had not read that article before. Very powerful stuff. I like the idea of using Mongo to store real-time server analytics. Could be a good open source project.

I can’t wait to test out your MongoMapper ORM!
John Nunemaker
Jun 04, 2009

@Jim – Cool, glad you learned something. I love that there are no more migrations or preparing your test database. It all happens magically on the fly and just works.

I have an idea for a stat app that requires no signup if you don’t care that your stats are public. You would just drop a snippet into your site and it would store everything based on domain and use upserts for pageviews and uniques. I started it and that is when I realized that I needed something like MongoMapper, so I’m waiting on myself too! :)
John Nunemaker
Jun 04, 2009

@Jonathan – I actually just used ActiveSupport’s callbacks as the first release is only targeting Rails. Callbacks was one of the easier things I added.
Jack Dempsey
Jun 05, 2009

John,

Sounds great, will have to look into it further. As you said, “first release is only targeting Rails”, sounds like the Rails dependency will be dropped at some point? I think that would be a great idea, as things like Sinatra, wider adoption of Rack, etc, are really paving the way for more non-Rails apps to do cool stuff.

Anyway, great writeup, thanks!

Jack
khelll
Jun 06, 2009

Thanks for the nice article, i was wondering if anybody can compare it to Tokyo Cabinet, specially when it comes to RDBMS like uses
Adrian Madrid
Jun 08, 2009

Great presentation. Can’t wait to see your adapter. In case anybody wants to see a (small) video presentation here is one I did at MWRC09: http://mwrc2009.confreaks.com/14-mar-2009-19-36-mongodb-adrian-madrid.html
bobes
Jun 08, 2009

John, MongoMapper looks like a great idea. Looking forward to playing with it in the project I’m just starting… Actually, I wouldn’t mind if you made the repo public even now :)
John Nunemaker
Jun 08, 2009

@Adrian – Thanks. Enjoyed your video presentation as well.

@bobes – I’m sure you wouldn’t mind. :) Right now the API is such a moving target it would be a pain for me to manage patches and bugs. I’d rather hack out a first release and then worry about that stuff. Sorry.
Ron Sweeney
Jun 10, 2009

John, thanks for your trip to Holland, MI and presenting this… im new to document db’s and was a week into CouchDB. MongoDB (and your presentation) removed a lot of weirdness that I was experiencing with couch for me to even think about using it for something.

+1 on the MongoMapper release
John Nunemaker
Jun 10, 2009

@Ron – You are welcome! I finished up embedded associations the other day so all I have left before a release is db referenced associations.