July 19, 2010

Posted by John

Tagged ruby

Older: Caching With Mongo

Newer: August 3rd

Creating Duplicable Objects

While working on MongoMapper, I have run into several interesting problems and solutions. One of those problems came while creating Plucky, a library that adds chain-able queries on top of a Mongo::Collection ruby driver object.

Plucky is centered around the query class, which is made up of a criteria and options. Every time you call a chain-able method, you get a new query instance. The purpose of this is to make sure you never do anything destructive to an existing query. Let’s say you create a new plucky query instance and add a limit:

query = Plucky::Query.new(collection)
updated = query.limit(10)
updated.equal?(query) # false

Calling limit clones the existing query and updates the limit of the new one like so:

class Query
  def limit(count=nil)
    clone.tap { |query| query.options[:limit] = count }
  end
end

This makes it so we can chain and chain away never modifying the original query. Yay! Whoa, there, don’t get so happy. Each query has options and criteria. When you clone the query, it does in fact clone stuff, but unfortunately our CriteriaHash and OptionsHash instances do not clone properly. They are both made up of source hashes and delegate most operations to those sources. Let’s take a look at the options hash:

class OptionsHash
  attr_reader :source

  def initialize(source)
    @source = source
  end

  def [](key)
    @source[key]
  end

  def []=(key, value)
    @source[key] = value
  end
end

The goal of the class above is to behave mostly like a hash, but normalize options to “mongo speak”. For the purposes of this post, I will leave out the normalization and focus just on make the class do what I needed. Note that setting of source and send the [] and []= methods to source. Now that we have that code, let’s do some work with it:

hash1 = OptionsHash.new({:foo => 'bar'})
hash2 = hash1.clone

puts hash1[:foo]
# 'bar'

hash2[:foo] = 'surprise'

puts hash1[:foo]
# 'surprise'

puts hash1.source.equal?(hash2.source)
# true

Hmm…that is not what we expected, right? We expected that when we clone the first options hash and modify the clone that it would not modify the original. The issue is that some things in Ruby do not get cloned when stored in other objects and one of those things is hashes. To do this right, we need to define initialize_copy for the options hash and tell it to fully clone the source hash when cloning the options hash like so:

class OptionsHash
  def initialize_copy(other)
    super
    @source = @source.clone
  end
end

Now, lets run our little test again:

hash1 = OptionsHash.new({:foo => 'bar'})
hash2 = hash1.clone

puts hash1[:foo]
# 'bar'

hash2[:foo] = 'surprise'

puts hash1[:foo]
# 'bar'

puts hash1.source.equal?(hash2.source)
# false

Ah, much better! Now our options hash knows how to properly clone itself and we get our expected result. Not sure who out there this will help, but I was unaware as of a few months ago so I thought I would post about it.

3 Comments

  1. Matt Jones Matt Jones

    Jul 19, 2010

    Note that this still falls down if anything in the Hash isn’t a value object. Example:

    
    irb --> h = { :a => 1, :b => 2, :c => [1,2,3,4] }
        ==> {:b=>2, :c=>[1, 2, 3, 4], :a=>1}
    irb --> h2 = h.clone
        ==> {:b=>2, :a=>1, :c=>[1, 2, 3, 4]}
    irb --> h2[:c] << 5
        ==> [1, 2, 3, 4, 5]
    irb --> h[:c]
        ==> [1, 2, 3, 4, 5]
    

    You need to do something stronger to get a clean break (continuing from the above IRB session):

    
    irb --> h3 = Marshal.load(Marshal.dump(h))
        ==> {:b=>2, :a=>1, :c=>[1, 2, 3, 4, 5]}
    irb --> h3[:c] << 6
        ==> [1, 2, 3, 4, 5, 6]
    irb --> h[:c]
        ==> [1, 2, 3, 4, 5]
    

    The marshal/unmarshal trick is apparently a standard Ruby “deep clone” idiom – I ran across it a while back when trying to make AR dirty tracking play nice with serialized hashes of hashes (don’t ask :) ).

    One final note for the paranoid: the above trick has all the usual caveats regarding un-Marshallable objects. See the Ruby docs for details.

  2. Elliot Winkler Elliot Winkler

    Jul 19, 2010

    In your opinion, is it better to override #initialize_copy, or #clone, or does it really matter?

  3. @Elliot: Definitely initialize_copy. That is the ruby way as far as I can tell. Overriding clone means dup won’t reap the rewards.

Thoughts? Do Tell...


textile enabled, preview above, please be nice
use <pre><code class="ruby"></code></pre> for code blocks

About

Authored by John Nunemaker (Noo-neh-maker), a programmer who has fallen deeply in love with Ruby. Learn More.

Projects

Flipper
Release your software more often with fewer problems.
Flip your features.