July 19, 2010
Older: Caching With Mongo
Newer: August 3rd
Creating Duplicable Objects
While working on MongoMapper, I have run into several interesting problems and solutions. One of those problems came while creating Plucky, a library that adds chain-able queries on top of a Mongo::Collection ruby driver object.
Plucky is centered around the query class, which is made up of a criteria and options. Every time you call a chain-able method, you get a new query instance. The purpose of this is to make sure you never do anything destructive to an existing query. Let’s say you create a new plucky query instance and add a limit:
query = Plucky::Query.new(collection)
updated = query.limit(10)
updated.equal?(query) # false
Calling limit clones the existing query and updates the limit of the new one like so:
class Query
def limit(count=nil)
clone.tap { |query| query.options[:limit] = count }
end
end
This makes it so we can chain and chain away never modifying the original query. Yay! Whoa, there, don’t get so happy. Each query has options and criteria. When you clone the query, it does in fact clone stuff, but unfortunately our CriteriaHash and OptionsHash instances do not clone properly. They are both made up of source hashes and delegate most operations to those sources. Let’s take a look at the options hash:
class OptionsHash
attr_reader :source
def initialize(source)
@source = source
end
def [](key)
@source[key]
end
def []=(key, value)
@source[key] = value
end
end
The goal of the class above is to behave mostly like a hash, but normalize options to “mongo speak”. For the purposes of this post, I will leave out the normalization and focus just on make the class do what I needed. Note that setting of source and send the [] and []= methods to source. Now that we have that code, let’s do some work with it:
hash1 = OptionsHash.new({:foo => 'bar'})
hash2 = hash1.clone
puts hash1[:foo]
# 'bar'
hash2[:foo] = 'surprise'
puts hash1[:foo]
# 'surprise'
puts hash1.source.equal?(hash2.source)
# true
Hmm…that is not what we expected, right? We expected that when we clone the first options hash and modify the clone that it would not modify the original. The issue is that some things in Ruby do not get cloned when stored in other objects and one of those things is hashes. To do this right, we need to define initialize_copy for the options hash and tell it to fully clone the source hash when cloning the options hash like so:
class OptionsHash
def initialize_copy(other)
super
@source = @source.clone
end
end
Now, lets run our little test again:
hash1 = OptionsHash.new({:foo => 'bar'})
hash2 = hash1.clone
puts hash1[:foo]
# 'bar'
hash2[:foo] = 'surprise'
puts hash1[:foo]
# 'bar'
puts hash1.source.equal?(hash2.source)
# false
Ah, much better! Now our options hash knows how to properly clone itself and we get our expected result. Not sure who out there this will help, but I was unaware as of a few months ago so I thought I would post about it.
3 Comments
Jul 19, 2010
Note that this still falls down if anything in the Hash isn’t a value object. Example:
You need to do something stronger to get a clean break (continuing from the above IRB session):
The marshal/unmarshal trick is apparently a standard Ruby “deep clone” idiom – I ran across it a while back when trying to make AR dirty tracking play nice with serialized hashes of hashes (don’t ask :) ).
One final note for the paranoid: the above trick has all the usual caveats regarding un-Marshallable objects. See the Ruby docs for details.
Jul 19, 2010
In your opinion, is it better to override #initialize_copy, or #clone, or does it really matter?
Jul 19, 2010
@Elliot: Definitely initialize_copy. That is the ruby way as far as I can tell. Overriding clone means dup won’t reap the rewards.
Thoughts? Do Tell...