February 21, 2010

Posted by John

Tagged harmony, mongomapper, and patterns

Older: MongoMapper 0.7: Plugins

Newer: Canable: The Flesh Eating Permission System

MongoMapper 0.7: Identity Map

While it isn’t quite ready to shove down everyone’s throats, IdentityMap is a hot new plugin I have been working on, directly due to a need that arose in Harmony.

One of the things I started to notice in Harmony right away was an epic amount of queries, 90% of which were simple id lookups repeated over and over. I remembered hearing about the identity map pattern before and figured I would research it a bit.

An old proverb says that a man with two watches never knows what time it is. If two watches are confusing, you can get in an even bigger mess with loading objects from a database.

So true. Now that I had the plugin system in place, I knew exactly how I would go about integrating IM (identity map) with MM. The great thing about the process was that I had a need from a real application. Over half of the IM plugin was created while vendored in Harmony. Several test cases that are currently in MM are directly due to weird bugs that we experienced in Harmony. I think it is really important to note that MM features grow from real applications, not theoretical fun.

Foo is Foo is Foo

The first thing I had to get working was to make sure that each document in the database equated to one object in Ruby. This means whether I call Item.find(id) or item.parent, if they are both loading the same document, they need to be the same object id in Ruby.

The easiest way I could think to do this was to have one method I always call whenever loading a document from the database. The obvious method name was load. Now, whenever an document is found in the database, I call load instead of new. This seems really insignificant, but down the rabbit trail it has significant implications.

Taking Advantage of Method Lookups and Super

Nothing fancy happens in load, but where it gets cool is in the IM plugin. Because every document coming from the database runs through load, all the IM plugin has to do is override load and ensure that the same document in the database is the same object in Ruby. It does this by storing each document in the map when loaded and then checking each document loaded against that store. If the document is already there, it returns the object for that document instead of creating a new one.

Each class gets an IM (just a plain old hash for now). Because each id is unique, the id is the key in the IM hash and the value is the Ruby object. Because of how method lookups work in Ruby, all that I do is include the IM plugin after the plugin that defines load and I can access the original load method using super. Below is the load method in the IM plugin:

def load(attrs)
  document = identity_map[attrs['_id']]

  if document.nil? || identity_map_off?
    document = super
    identity_map[document._id] = document if identity_map_on?
  end

  document
end

Without even knowing what the other methods do, you can get a feel for what is going on. First, I set document to where I assume it will be in the identity map for the class. If it is nil, that means it doesn’t exist, so I call super which calls the normal load method, and returns a new instance of the class based on the document from the database. Then, if the identity map is turned on, I add it to the map by setting the id as the key and the object as the value.

At the end, I return the document no matter what has happened before. With this tiny bit of code (and a few more lines for setting up hash store), I ensure that no matter how or when a document is queried from the database, it is always the same object in memory. Soooo sweet. Like I said, huge ramifications.

Identity Map Lookups Zap Simple Queries

Once I had this part done, all that was left was to zap the actual queries to the database if the document has already been loaded. I refactored the MM internal methods for finding documents to find_one and find_many. This means whenever you do any kind of find query in MM, eventually it hits one of those two methods.

Taking the same approach as load, if you are doing everything through one or two methods, all you have to do to change those methods is redefine them in the plugin to behave differently. The best part is you still have access to the originals using super. find_one in the IM plugin looks like this:

def find_one(options={})
  criteria, query_options = to_query(options)

  if simple_find?(criteria) && identity_map.key?(criteria[:_id])
    identity_map[criteria[:_id]]
  else
    super.tap do |document|
      remove_documents_from_map(document) if selecting_fields?(query_options)
    end
  end
end

simple_find? returns true if doing a query only by _id or _id and _type (which happens when using single collection inheritance). This method addition means we can return the document from the identity map without doing a query, if it has already been loaded.

A Simple, Yet Awesome Example

I learn best with code examples, so here is one for you. A simple item class for creating an even more simple tree.

MongoMapper.connection = Mongo::Connection.new('127.0.0.1', 27017, :logger => Logger.new(STDOUT))
MongoMapper.database = 'testing'

class Item
  include MongoMapper::Document
  
  key :title, String
  key :parent_id, ObjectId
  
  belongs_to :parent, :class_name => 'Item'
end

root = Item.create(:title => 'Root')
child = Item.create(:title => 'Child', :parent => root)
grand_child = Item.create(:title => 'Grand Child', :parent => child)

puts root.equal?(child.parent) # false
puts child.equal?(grand_child.parent) # false

If you run this code, you get false and false as the output, along with a few queries to find the parent of child and grand_child. Below is some sample output:

MONGODB admin.$cmd.find({:ismaster=>1}, {}).limit(-1)
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000001}, {"title"=>"Root", "_id"=>4b81a114d072c40c3f000001, "parent_id"=>nil})
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000002}, {"title"=>"Child", "_id"=>4b81a114d072c40c3f000002, "parent_id"=>4b81a114d072c40c3f000001})
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000003}, {"title"=>"Grand Child", "_id"=>4b81a114d072c40c3f000003, "parent_id"=>4b81a114d072c40c3f000002})
MONGODB testing.items.find({:_id=>4b81a114d072c40c3f000001}, {}).limit(-1)
false
MONGODB testing.items.find({:_id=>4b81a114d072c40c3f000002}, {}).limit(-1)
false

Check out the same script, with the addition of the identity map plugin:

MongoMapper.connection = Mongo::Connection.new('127.0.0.1', 27017, :logger => Logger.new(STDOUT))
MongoMapper.database = 'testing'

class Item
  include MongoMapper::Document
  plugin MongoMapper::Plugins::IdentityMap
  
  key :title, String
  key :parent_id, ObjectId
  
  belongs_to :parent, :class_name => 'Item'
end

root = Item.create(:title => 'Root')
child = Item.create(:title => 'Child', :parent => root)
grand_child = Item.create(:title => 'Grand Child', :parent => child)

puts root.equal?(child.parent) # true
puts child.equal?(grand_child.parent) # true

Note that we get true for both. Also, we get no queries for the parents of child and grand_child, as they are already in the identity map. The output looks something like this:

MONGODB admin.$cmd.find({:ismaster=>1}, {}).limit(-1)
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000001}, {"title"=>"Root", "_id"=>4b81a0c9d072c40c2c000001, "parent_id"=>nil})
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000002}, {"title"=>"Child", "_id"=>4b81a0c9d072c40c2c000002, "parent_id"=>4b81a0c9d072c40c2c000001})
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000003}, {"title"=>"Grand Child", "_id"=>4b81a0c9d072c40c2c000003, "parent_id"=>4b81a0c9d072c40c2c000002})
true
true

Obviously, this is a really small and simple example, but with one small addition, we saved a few queries. Imagine how much of a difference it makes in a big application making lots of requests. The thing I am most amazed at is how much punch this plugin adds compared to the amount of code in the implementation. As of 0.7, the identity map plugin is only 122 lines of code.

Using the IM Plugin with 0.7

The IM plugin is by no means feature complete, so I am not automatically including it in every Document yet. I will say that what I have added is production ready and we have been using it in Harmony for over a month now. You can use it on a model by model basis like this:

class Foo
  include MongoMapper::Document
  plugin MongoMapper::Plugins::IdentityMap
end

Or you can turn it on for all documents by dropping this in an initializer (stolen directly from Harmony):

module IdentityMapAddition
  def self.included(model)
    model.plugin MongoMapper::Plugins::IdentityMap
  end
end

MongoMapper::Document.append_inclusions(IdentityMapAddition)

Correct, Beautiful, Fast

The IM plugin, for me, is a great example of Correct, Beautiful, Fast. First, we built Harmony in a way that worked and was easy to read (correct and beautiful). Then, when we needed to make it fast, all we had to do was override the implementation in a few spots and we cut our queries in half (or more) over night. It is far easier to find where you need to optimize when your code is correct and beautiful.

10 Comments

  1. The more I read about this project, the more it sounds like you are re-creating DataMapper bit-by-bit. As someone who has put in a lot of work on the DataMapper SimpleDB adapter and been pleasantly surprised at how well DM lends itself to non-SQL backends, I’m curious: why this, instead of a DataMapper adapter?

  2. @Avdi – I looked into it at the beginning, but didn’t feel like I got enough out of the box. There is nothing wrong with DM, it just isn’t for me. I think the MM code base is relatively small and simple and I like that. I am also enjoying the experience of learning while working on it (which is priceless).

  3. What, if any, process is used to clear the Identity Map when all other references to an object disappear? I seem to remember DataMapper having a session block that the Identity map lives in. Leaving and reopening this block clears the map. Without some form of Garbage collection an Identity Map becomes a Memory leak.

  4. @John – good question. I forgot to post about that. We use rack middleware to clear the map for each request. I’ll add that to the post tomorrow.

  5. Nice work. How easy would this be to get working with Rails 3? Would be a nice finishing touch to the AR refactoring.

  6. jacques crocker jacques crocker

    Feb 22, 2010

    Kieran, I’m currently using it now with Rails3 via an ActiveModel fork of MongoMapper at http://github.com/merbjedi/mongomapper

    All tests pass, and it seems to be working great so far!

  7. @John – Below is the piece of middleware we use. Nothing fancy.

    class PerRequestMapClear
      def initialize(app)
        @app = app
      end
     
      def call(env)
        MongoMapper::Plugins::IdentityMap.clear
        @app.call(env)
      ensure
        MongoMapper::Plugins::IdentityMap.clear
      end
    end

    MM stores all the models using the map and clear loops through them and calls clear on each.

  8. I can understand wanting to build your own implementation but I have to agree with Avdi, there seems to be a lot of duplicated effort.

    As someone who has spent a lot of time with merb and DM it was not very appealing to adopt a new adapter to use mongoDB.

    DataMapper has been through its growing pains and as become a fantastic project. I can use redis just by using the dm-redis-adapter and I don’t have to change much else in my app. I get things like merb-auth for free and there is nothing new to learn for things like validations and hooks in my models.

    Anyways, I don’t want to come down on you too much. MM looks like it is shaping into a great project. IdentityMap is a great pattern. I am being hard on you mostly for selfish reasons. I would have loved to have used this project but couldn’t justify to myself adopting a new datastore adapter.

    I hope you have given the DM folks feedback on what didn’t work for you so that they can make the necessary changes to prevent this duplication in the future.

  9. @Sintaxi – Then maybe DM and AR should merge too. I can understand the on the surface people see similarities and such, but I guess who cares if people are duplicating efforts.

    Honestly, duplication or not I wouldn’t trade what I’ve learned while building MongoMapper for anything.

  10. Cool! I have big use case for this feature!

    I’m heavily using ‘Composite Pattern’ http://en.wikipedia.org/wiki/Composite_pattern pattern in my App (page -> notes, page -> gallery -> images, and so on), and almost broke my mind in finding a way to solve n+1 problem.

    I can’t embed all tree in Page (every Item should be independent document) and was trying to use kind of ‘embedded cache’ but it getting very complicated.

    But with IM all that i need – just cache in Item(Page) all ID’s of it’s descendants (Notes, Folders, Images), that’s very easy, and load them all before using the page!

    But right now it seems that IM implementd only for find_one method not for find_many, awaiting to see update for find_many!

    Great Project, John :)

Thoughts? Do Tell...


textile enabled, preview above, please be nice
use <pre><code class="ruby"></code></pre> for code blocks

About

Authored by John Nunemaker (Noo-neh-maker), a programmer who has fallen deeply in love with Ruby. Learn More.

Projects

Flipper
Release your software more often with fewer problems.
Flip your features.