November 17, 2008
Older: Sinatra for IRC
Newer: Delayed Gratification with Rails
HappyMapper, Making XML Fun Again
As much as I write about XML, you would swear it is all I do, but I promise it is not. In fact, I do not really use XML that often, but I will admit that I am intrigued by it. A while back, you may remember, I posted about ROXML, a ruby object to xml mapping library. I liked the idea but not the implementation. Soon after, I started playing around with what I have named HappyMapper, a ruby object to xml mapping library.
Logo created by Peter CooperI wrote nearly 95% of it in a weekend and then let it sit. I let it sit so long that it started to rot. Today it hit me that I do not have to finish something in order to release it. The thing that wasn’t working was xml with a default namespace. For good reasons I am sure, libxml-ruby does not like having default namespaces. I thought to myself, you know, this library is cool even without namespace junk. I mean who even uses namespaces other than Amazon. I started to package it for release and then I noticed a few nitpicky things. I tweaked them and five hours later I had also fixed the namespace issue and changed the API a bit. So much for releasing unfinished code in hopes that someone smarter than I would finish it up…
Examples
But I digress, you do not care about all that, right? How about some examples? Twitter’s xml seems to be popular on this here blawg, so I will start with that. Given this xml sample from twitter:
<statuses type="array">
<status>
<created_at>Sat Aug 09 05:38:12 +0000 2008</created_at>
<id>882281424</id>
<text>I so just thought the guy lighting the Olympic torch was falling when he began to run on the wall. Wow that would have been catastrophic.</text>
<notextile><code class="inline_code"><span class="CodeRay">web</span></code></notextile>
<truncated>false</truncated>
<in_reply_to_status_id>1234</in_reply_to_status_id>
<in_reply_to_user_id>12345</in_reply_to_user_id>
<favorited></favorited>
<user>
<id>4243</id>
<name>John Nunemaker</name>
<screen_name>jnunemaker</screen_name>
<location>Mishawaka, IN, US</location>
<description>Loves his wife, ruby, notre dame football and iu basketball</description>
<profile_image_url>http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg</profile_image_url>
<url>http://addictedtonew.com</url>
<protected>false</protected>
<followers_count>486</followers_count>
</user>
</status>
</statuses>
You could setup the following ruby objects:
class User
include HappyMapper
element :id, Integer
element :name, String
element :screen_name, String
element :location, String
element :description, String
element :profile_image_url, String
element :url, String
element :protected, Boolean
element :followers_count, Integer
end
class Status
include HappyMapper
element :id, Integer
element :text, String
element :created_at, Time
element :source, String
element :truncated, Boolean
element :in_reply_to_status_id, Integer
element :in_reply_to_user_id, Integer
element :favorited, Boolean
has_one :user, User
end
statuses = Status.parse(xml_string)
statuses.each do |status|
puts status.user.name, status.user.screen_name, status.text, status.source, ''
end
You can note a few things about HappyMapper from that example.
- Each xml element and attribute can be typecast.
- You can define association-like elements that are formed from other HappyMapper objects (see
has_one :user, User
in Status). - You get a parse method when including HappyMapper that takes a string and does all the magic for you.
That was an easy one, how about something more complex and ugly, like some Amazon xml. Given some Amazon xml such as this:
<?xml version="1.0" encoding="UTF-8"?>
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2005-10-05">
<OperationRequest>
<HTTPHeaders>
<Header Name="UserAgent">
</Header>
</HTTPHeaders>
<RequestId>16WRJBVEM155Q026KCV1</RequestId>
<Arguments>
<Argument Name="SearchIndex" Value="Books"></Argument>
<Argument Name="Service" Value="AWSECommerceService"></Argument>
<Argument Name="Title" Value="Ruby on Rails"></Argument>
<Argument Name="Operation" Value="ItemSearch"></Argument>
<Argument Name="AWSAccessKeyId" Value="dontbeaswoosh"></Argument>
</Arguments>
<RequestProcessingTime>0.064924955368042</RequestProcessingTime>
</OperationRequest>
<Items>
<Request>
<IsValid>True</IsValid>
<ItemSearchRequest>
<SearchIndex>Books</SearchIndex>
<Title>Ruby on Rails</Title>
</ItemSearchRequest>
</Request>
<TotalResults>22</TotalResults>
<TotalPages>3</TotalPages>
<Item>
<ASIN>0321480791</ASIN>
<DetailPageURL>http://www.amazon.com/gp/redirect.html%3FASIN=0321480791%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0321480791%253FSubscriptionId=dontbeaswoosh</DetailPageURL>
<ItemAttributes>
<Author>Michael Hartl</Author>
<Author>Aurelius Prochazka</Author>
<Manufacturer>Addison-Wesley Professional</Manufacturer>
<ProductGroup>Book</ProductGroup>
<Title>RailsSpace: Building a Social Networking Website with Ruby on Rails (Addison-Wesley Professional Ruby Series)</Title>
</ItemAttributes>
</Item>
</Items>
</ItemSearchResponse>
You could create the following objects to obtain Item information:
module PITA
class Item
include HappyMapper
tag 'Item' # if you put class in module you need tag
element :asin, String, :tag => 'ASIN'
element :detail_page_url, String, :tag => 'DetailPageURL'
element :manufacturer, String, :tag => 'Manufacturer', :deep => true
end
class Items
include HappyMapper
tag 'Items' # if you put class in module you need tag
element :total_results, Integer, :tag => 'TotalResults'
element :total_pages, Integer, :tag => 'TotalPages'
has_many :items, Item
end
end
item = PITA::Items.parse(xml_string, :single => true, :use_default_namespace => true)
item.items.each do |i|
puts i.asin, i.detail_page_url, i.manufacturer, ''
end
The previous example showed a few more things.
- You can put your HappyMapper objects in a module and define the tag name (see tag ‘Item’ inside PITA::Item).
- You can create nice methods for crappy camel cased xml tags (see
element :total_pages, Integer, :tag => 'TotalPages'
in PITA::Items). - There is also a has_many association-like method that allows defining a collection of HappyMapper objects. (see
has_many :items, Item
in PITA::Items) - You do not have to map exact parent child relationships. You can go deep see diving with the :deep option on any element to pluck out grandchildren and such. (see
element :manufacturer
in PITA::Item)
Installation
Installation is typical as the gem is on rubyforge and github.
#rubyforge
$ sudo gem install happymapper
# github
$ sudo gem install jnunemaker-happymapper
If you run into problems, feel free to fork and add some specs for the xml it is not working with. From there you can dive in and fix them or let me know and I will take a look. Think this will be handy? Got an idea? Let me know in the comments below. Oh, and yes, in the future, HappyMapper will have killer HTTParty integration.
27 Comments
Nov 17, 2008
This is absolutely perfect, probably just what i needed for RAAWS (see mysite). I’ve been struggling with an Amazon lib, and an easy way to map the response was just what i needed to complete it. I’m also using libxml by the way.
Thanks !
PS oups just realized you wrote AAWS. I thought of forking your lib at first but although i liked it i needed another structure. Basically i see ItemOperation and alike as the Controller, SearchIndex as the DB/Model and the response as the View and HappyMapper is just fit for it. I’m still in prepre alpha but started using it for a project.
Nov 17, 2008
I remember you writing about this a while back, and your solution is awesome. At the same time I was reading your ideas a while back, I was building a solution of my own that would allow the same type of functionality for Flickr, Vimeo, and Upcoming. My solution was a quick one, and lacks the beauty that yours has. I love the easy ability to typecast and set associations. Thanks for releasing it.
Nov 17, 2008
@charly No! Work on AAWS and finish it for me. :)
@Nate You are too kind.
Nov 17, 2008
John, I’m really excited to try this out when the gems get pushed over to Github. I’m working right now on bridging Adobe Flex and a CouchDB database backend, and something like this would be perfect to expressively declare the type of XML that Flex needs. Quick question, is there any way to declare attributes on the xml node?
If not, I’m thinking about forking it and adding that functionality in myself.
Cheers to another amazing gem. I use HTTParty everywhere, and include an apps/api directory in my Projects for the sole use of hooking up Api’s with the partayyyyyy.
Awesome.
Nov 17, 2008
Interesting! I might want to use this for the consuming-end of XTF-Ruby (which is not even half-baked at the moment).
Nov 17, 2008
@maklomalko – Yep, the I have an example at github for working with attributes.
@jamieorc – what is XTF-Ruby?
Nov 17, 2008
XTF-Ruby is a library I started to interface with XTF ( http://xtf.wiki.sourceforge.net/ ). I’ve been working on a client project that we selected XTF for its wonderful fulltext browsing abilities (otherwise, we would have used Solr). It uses XSLT for everything, including building html pages and parsing queries for searching the index. Except for browsing actual fulltext documents, I didn’t want to work with XTF’s vertical infrastructure, and thus XSLT, so I the main java developer, Martin Haye, graciously helped build a couple of things into XTF so I could query the index directly with XML. I built XTF-Ruby to generate the XML queries out of Ruby code. That part is built out and working.
However, since XTF returns XML, I’m just directly parsing the XML into Ruby objects using libxml-ruby (with a few Hpricotish methods thrown in). I actually haven’t pushed any of this side of XTF-Ruby up to github yet, but eventually, there part of the library will consume the XML and create Ruby objects. That’s why HappyMapper caught my eye.
http://github.com/jamieorc/xtf-ruby/tree/master
Nov 18, 2008
Hi! I’ve tried to use HappyMapper but I came to the realization that it is terribly slow (at least for what I am doing). I am parsing a 500KB XML file with nested objects and HappyMapper gets extremely, terribly slow. My test suite runs in 12-16 seconds if I am using it and in 2 seconds if I parse stuff manually using Hpricot. So a couple of questions:
a) why libxml and not hpricot?
b) did you check how it behaves when elements recurse (essentially has_many :children, self)
This looks like a terribly neat idea but it’s currently too slow to be useful. Another peeve that I have with HM is that it does preserve parent relationships. If I load children of element A, I got no way to trace back to element A from one of the child objects. There is also no initialization callback that I can use to capture the parent element in any way.
But from the API point of view it seems like a very neat, lovely library. Please keep up the good work!
Nov 18, 2008
@Julik a) as far as I know libxml is faster than hpricot. b) I have not done any performance testing. I simply laid out the API that I wanted and made it work for xml that is typical of what I work with.
You have a couple options. First, finders keepers. You found out where it is slow so you fix it. :) Seriously though, I am open to help on making it faster and adding the ability for nesting multiple levels if you or anyone else is interested.
The second option is to pastie or email me the classes and example xml you are working with and I can take a look at it when I get a chance. I’m big into batching tasks, so I probably won’t look at it for a while, but I’m definitely interested in making HM faster and better.
Glad you like the API. I feel that is my strength. I know how to make things simple to use, but usually don’t worry about heavy lifting. Anyone who wants to work on speed and functionality is welcome to as long as the API stays clean and simple.
Nov 18, 2008
Libxml is definitely faster than older versions of hpricot. When I was first building my client’s app, I used Hpricot to parse the XML into Ruby objects. Our performance wasn’t what we needed so I ran two tests. The first ran an Hpricot parser against XML vs LibXML-Ruby against the same XML. Overall, I saw about a 6x-7x performance boost when using LibXML-Ruby. My second test pitted the web app using Hpricot vs the web app using LibXML-Ruby. I used Httperf for this. In the web app, I saw about 3x increase in page load speed when all was done.
Now, having said that. _why has recently increased the speed for Hpricot after he got a little competition from Nokogiri, which uses LibXML-Ruby. Still, if you’re working with XML, I’d stick with LibXML-Ruby, as Hpricot isn’t quite so XML-friendly. It’s very focused on parsing HTML.
Way down in these comments are some benchmarks:
http://hackety.org/2008/11/03/hpricotStrikesBack.html
Nov 20, 2008
I wrote something similar, and it’s also not 100% there, but mostly: http://xml-object.rubyforge.org
It requires even less code to use, but you don’t get type conversion.
Nov 20, 2008
How does happymapper or xml-object compare to xml-mapping (http://xml-mapping.rubyforge.org/) which has been around for quite a while and appears to be fairly complete?
Nov 20, 2008
@Jordi – Yours isn’t so much an object to xml mapper as it is an automatic parser. Cool project though. It seems like a better version of xml-simple.
@Rob – Didn’t know it existed. Have you used it? I would say the most important difference is that HM has a much prettier API than XM. I like that it allows default values. I might have to add that to HM. It hasn’t been updated in a few years which says dead project to me.
Another difference is that it appears to be built on REXML, whereas HM is built on libxml-ruby, so HM should parse the xml quite a bit faster unless XM uses REXML stream parsing, which I doubt.
Those are my off the cuff reactions. :)
Nov 21, 2008
Cool I just finished a tiny sinatra project making ruby objects from XML or HTTP responses, and it was a PITA. I used hpricot. I think using HappyMapper would have made it easier, but I’m glad I did the raw object creation at least once though too.
Nov 21, 2008
John:
Loving the plugin. Quick question::: I have a chunk of XML that looks something like this:
…
…
The question is, what is the best way to access the items? Right now I can get the attribute, or one of the feature items, but not all of them. Thoughts?
Nov 21, 2008
@Kevin – No problem. See the paste here or the code below.
Nov 21, 2008
John:
I copied / pasted your Pastie and as you say, it’s output is one of the feature items. While really cool, I am looking to get both of them, not just the one.
Additionally, you are a rock star.
- Kevin
Nov 21, 2008
@Kevin – Congratulations! You found something HappyMapper cannot currently map. I’ll add it to lighthouse as a bug. Haha. Nice work.
Nov 23, 2008
John,
Are you aware that there’s a much changed ROXML 2.0 out?
http://ben.woosley.name/log/?p=45
Nov 23, 2008
@Per Melin – Nope hadn’t seen the updates. I really don’t like the API of ROXML. xml_attribute, xml_text and ROXML::TAG_ARRAY seem too verbose to me.
Nov 24, 2008
John,
The API is different now. If you quickly browse through the post at the link I gave it may look like nothing has changed, but that’s because he is giving before-and-after examples.
Dec 02, 2008
I’ve have this weird xml that happymapper is not able to map!
I tried to set the :tag=>"openSearch:startIndex",but i still get an awful error,but the problem seems to be directly with libxml!
LibXML::XML::Error: Error: Invalid expression at :0.
Dec 02, 2008
@alex – is that all the XML? Does the root element have a namespace?
Dec 02, 2008
Nope,thats not the whole thing.
This is the complete XML
<?xml version=“1.0” ?>
4
1
4
12345
Pride and Prejudice
4321
UA-12345-1
ga:4321
Dec 02, 2008
@Alex – Ok, thanks. I’ll put that on the todo list as a test case. I’m betting that libxml-ruby doesn’t like namespaced xml tags without a namespace.
Dec 02, 2008
I’m looking into it right now,may be i’ll get a fix for that :)
Laters!
Dec 06, 2008
@John & @Kevin,
I was looking at the mapping issue today and came up with something that works out of the box: pastie
Check it out – oh, I also updated the lighthouse ticket.
Take care.
Sorry, comments are closed for this article to ease the burden of pruning spam.