November 17, 2008

Posted by John

Tagged gems and xml

Older: Sinatra for IRC

Newer: Delayed Gratification with Rails

HappyMapper, Making XML Fun Again

As much as I write about XML, you would swear it is all I do, but I promise it is not. In fact, I do not really use XML that often, but I will admit that I am intrigued by it. A while back, you may remember, I posted about ROXML, a ruby object to xml mapping library. I liked the idea but not the implementation. Soon after, I started playing around with what I have named HappyMapper, a ruby object to xml mapping library.

Happy Mapper LogoLogo created by Peter CooperI wrote nearly 95% of it in a weekend and then let it sit. I let it sit so long that it started to rot. Today it hit me that I do not have to finish something in order to release it. The thing that wasn’t working was xml with a default namespace. For good reasons I am sure, libxml-ruby does not like having default namespaces. I thought to myself, you know, this library is cool even without namespace junk. I mean who even uses namespaces other than Amazon. I started to package it for release and then I noticed a few nitpicky things. I tweaked them and five hours later I had also fixed the namespace issue and changed the API a bit. So much for releasing unfinished code in hopes that someone smarter than I would finish it up…

Examples

But I digress, you do not care about all that, right? How about some examples? Twitter’s xml seems to be popular on this here blawg, so I will start with that. Given this xml sample from twitter:

<statuses type="array"> 
  <status> 
    <created_at>Sat Aug 09 05:38:12 +0000 2008</created_at> 
    <id>882281424</id> 
    <text>I so just thought the guy lighting the Olympic torch was falling when he began to run on the wall. Wow that would have been catastrophic.</text> 
    <notextile><code class="inline_code"><span class="CodeRay">web</span></code></notextile> 
    <truncated>false</truncated> 
    <in_reply_to_status_id>1234</in_reply_to_status_id> 
    <in_reply_to_user_id>12345</in_reply_to_user_id> 
    <favorited></favorited> 
    <user> 
      <id>4243</id> 
      <name>John Nunemaker</name> 
      <screen_name>jnunemaker</screen_name> 
      <location>Mishawaka, IN, US</location> 
      <description>Loves his wife, ruby, notre dame football and iu basketball</description> 
      <profile_image_url>http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg</profile_image_url> 
      <url>http://addictedtonew.com</url> 
      <protected>false</protected> 
      <followers_count>486</followers_count> 
    </user> 
  </status> 
</statuses>

You could setup the following ruby objects:

class User
  include HappyMapper
  
  element :id, Integer
  element :name, String
  element :screen_name, String
  element :location, String
  element :description, String
  element :profile_image_url, String
  element :url, String
  element :protected, Boolean
  element :followers_count, Integer
end

class Status
  include HappyMapper
  
  element :id, Integer
  element :text, String
  element :created_at, Time
  element :source, String
  element :truncated, Boolean
  element :in_reply_to_status_id, Integer
  element :in_reply_to_user_id, Integer
  element :favorited, Boolean
  has_one :user, User
end

statuses = Status.parse(xml_string)
statuses.each do |status|
  puts status.user.name, status.user.screen_name, status.text, status.source, ''
end

You can note a few things about HappyMapper from that example.

  1. Each xml element and attribute can be typecast.
  2. You can define association-like elements that are formed from other HappyMapper objects (see has_one :user, User in Status).
  3. You get a parse method when including HappyMapper that takes a string and does all the magic for you.

That was an easy one, how about something more complex and ugly, like some Amazon xml. Given some Amazon xml such as this:

<?xml version="1.0" encoding="UTF-8"?>
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2005-10-05">
	<OperationRequest>
		<HTTPHeaders>
			<Header Name="UserAgent">
			</Header>
		</HTTPHeaders>
		<RequestId>16WRJBVEM155Q026KCV1</RequestId>
		<Arguments>
			<Argument Name="SearchIndex" Value="Books"></Argument>
			<Argument Name="Service" Value="AWSECommerceService"></Argument>
			<Argument Name="Title" Value="Ruby on Rails"></Argument>
			<Argument Name="Operation" Value="ItemSearch"></Argument>
			<Argument Name="AWSAccessKeyId" Value="dontbeaswoosh"></Argument>
		</Arguments>
		<RequestProcessingTime>0.064924955368042</RequestProcessingTime>
	</OperationRequest>
	<Items>
		<Request>
			<IsValid>True</IsValid>
			<ItemSearchRequest>
				<SearchIndex>Books</SearchIndex>
				<Title>Ruby on Rails</Title>
			</ItemSearchRequest>
		</Request>
		<TotalResults>22</TotalResults>
		<TotalPages>3</TotalPages>
		<Item>
			<ASIN>0321480791</ASIN>
		<DetailPageURL>http://www.amazon.com/gp/redirect.html%3FASIN=0321480791%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0321480791%253FSubscriptionId=dontbeaswoosh</DetailPageURL>
			<ItemAttributes>
				<Author>Michael Hartl</Author>
				<Author>Aurelius Prochazka</Author>
				<Manufacturer>Addison-Wesley Professional</Manufacturer>
				<ProductGroup>Book</ProductGroup>
				<Title>RailsSpace: Building a Social Networking Website with Ruby on Rails (Addison-Wesley Professional Ruby Series)</Title>
			</ItemAttributes>
		</Item>
	</Items>
</ItemSearchResponse>

You could create the following objects to obtain Item information:

module PITA
  class Item
    include HappyMapper
    
    tag 'Item' # if you put class in module you need tag
    element :asin, String, :tag => 'ASIN'
    element :detail_page_url, String, :tag => 'DetailPageURL'
    element :manufacturer, String, :tag => 'Manufacturer', :deep => true
  end

  class Items
    include HappyMapper
    
    tag 'Items' # if you put class in module you need tag
    element :total_results, Integer, :tag => 'TotalResults'
    element :total_pages, Integer, :tag => 'TotalPages'
    has_many :items, Item
  end
end

item = PITA::Items.parse(xml_string, :single => true, :use_default_namespace => true)
item.items.each do |i|
  puts i.asin, i.detail_page_url, i.manufacturer, ''
end

The previous example showed a few more things.

  1. You can put your HappyMapper objects in a module and define the tag name (see tag ‘Item’ inside PITA::Item).
  2. You can create nice methods for crappy camel cased xml tags (see element :total_pages, Integer, :tag => 'TotalPages' in PITA::Items).
  3. There is also a has_many association-like method that allows defining a collection of HappyMapper objects. (see has_many :items, Item in PITA::Items)
  4. You do not have to map exact parent child relationships. You can go deep see diving with the :deep option on any element to pluck out grandchildren and such. (see element :manufacturer in PITA::Item)

Installation

Installation is typical as the gem is on rubyforge and github.

#rubyforge
$ sudo gem install happymapper

# github
$ sudo gem install jnunemaker-happymapper

If you run into problems, feel free to fork and add some specs for the xml it is not working with. From there you can dive in and fix them or let me know and I will take a look. Think this will be handy? Got an idea? Let me know in the comments below. Oh, and yes, in the future, HappyMapper will have killer HTTParty integration.

27 Comments

  1. This is absolutely perfect, probably just what i needed for RAAWS (see mysite). I’ve been struggling with an Amazon lib, and an easy way to map the response was just what i needed to complete it. I’m also using libxml by the way.
    Thanks !

    PS oups just realized you wrote AAWS. I thought of forking your lib at first but although i liked it i needed another structure. Basically i see ItemOperation and alike as the Controller, SearchIndex as the DB/Model and the response as the View and HappyMapper is just fit for it. I’m still in prepre alpha but started using it for a project.

  2. I remember you writing about this a while back, and your solution is awesome. At the same time I was reading your ideas a while back, I was building a solution of my own that would allow the same type of functionality for Flickr, Vimeo, and Upcoming. My solution was a quick one, and lacks the beauty that yours has. I love the easy ability to typecast and set associations. Thanks for releasing it.

  3. @charly No! Work on AAWS and finish it for me. :)

    @Nate You are too kind.

  4. malkomalko malkomalko

    Nov 17, 2008

    John, I’m really excited to try this out when the gems get pushed over to Github. I’m working right now on bridging Adobe Flex and a CouchDB database backend, and something like this would be perfect to expressively declare the type of XML that Flex needs. Quick question, is there any way to declare attributes on the xml node?

    Look at me!

    If not, I’m thinking about forking it and adding that functionality in myself.

    Cheers to another amazing gem. I use HTTParty everywhere, and include an apps/api directory in my Projects for the sole use of hooking up Api’s with the partayyyyyy.

    Awesome.

  5. Interesting! I might want to use this for the consuming-end of XTF-Ruby (which is not even half-baked at the moment).

  6. @maklomalko – Yep, the I have an example at github for working with attributes.

    @jamieorc – what is XTF-Ruby?

  7. XTF-Ruby is a library I started to interface with XTF ( http://xtf.wiki.sourceforge.net/ ). I’ve been working on a client project that we selected XTF for its wonderful fulltext browsing abilities (otherwise, we would have used Solr). It uses XSLT for everything, including building html pages and parsing queries for searching the index. Except for browsing actual fulltext documents, I didn’t want to work with XTF’s vertical infrastructure, and thus XSLT, so I the main java developer, Martin Haye, graciously helped build a couple of things into XTF so I could query the index directly with XML. I built XTF-Ruby to generate the XML queries out of Ruby code. That part is built out and working.

    However, since XTF returns XML, I’m just directly parsing the XML into Ruby objects using libxml-ruby (with a few Hpricotish methods thrown in). I actually haven’t pushed any of this side of XTF-Ruby up to github yet, but eventually, there part of the library will consume the XML and create Ruby objects. That’s why HappyMapper caught my eye.

    http://github.com/jamieorc/xtf-ruby/tree/master

  8. Hi! I’ve tried to use HappyMapper but I came to the realization that it is terribly slow (at least for what I am doing). I am parsing a 500KB XML file with nested objects and HappyMapper gets extremely, terribly slow. My test suite runs in 12-16 seconds if I am using it and in 2 seconds if I parse stuff manually using Hpricot. So a couple of questions:

    a) why libxml and not hpricot?
    b) did you check how it behaves when elements recurse (essentially has_many :children, self)

    This looks like a terribly neat idea but it’s currently too slow to be useful. Another peeve that I have with HM is that it does preserve parent relationships. If I load children of element A, I got no way to trace back to element A from one of the child objects. There is also no initialization callback that I can use to capture the parent element in any way.

    But from the API point of view it seems like a very neat, lovely library. Please keep up the good work!

  9. @Julik a) as far as I know libxml is faster than hpricot. b) I have not done any performance testing. I simply laid out the API that I wanted and made it work for xml that is typical of what I work with.

    You have a couple options. First, finders keepers. You found out where it is slow so you fix it. :) Seriously though, I am open to help on making it faster and adding the ability for nesting multiple levels if you or anyone else is interested.

    The second option is to pastie or email me the classes and example xml you are working with and I can take a look at it when I get a chance. I’m big into batching tasks, so I probably won’t look at it for a while, but I’m definitely interested in making HM faster and better.

    Glad you like the API. I feel that is my strength. I know how to make things simple to use, but usually don’t worry about heavy lifting. Anyone who wants to work on speed and functionality is welcome to as long as the API stays clean and simple.

  10. Libxml is definitely faster than older versions of hpricot. When I was first building my client’s app, I used Hpricot to parse the XML into Ruby objects. Our performance wasn’t what we needed so I ran two tests. The first ran an Hpricot parser against XML vs LibXML-Ruby against the same XML. Overall, I saw about a 6x-7x performance boost when using LibXML-Ruby. My second test pitted the web app using Hpricot vs the web app using LibXML-Ruby. I used Httperf for this. In the web app, I saw about 3x increase in page load speed when all was done.

    Now, having said that. _why has recently increased the speed for Hpricot after he got a little competition from Nokogiri, which uses LibXML-Ruby. Still, if you’re working with XML, I’d stick with LibXML-Ruby, as Hpricot isn’t quite so XML-friendly. It’s very focused on parsing HTML.

    Way down in these comments are some benchmarks:

    http://hackety.org/2008/11/03/hpricotStrikesBack.html

  11. I wrote something similar, and it’s also not 100% there, but mostly: http://xml-object.rubyforge.org

    It requires even less code to use, but you don’t get type conversion.

  12. How does happymapper or xml-object compare to xml-mapping (http://xml-mapping.rubyforge.org/) which has been around for quite a while and appears to be fairly complete?

  13. @Jordi – Yours isn’t so much an object to xml mapper as it is an automatic parser. Cool project though. It seems like a better version of xml-simple.

    @Rob – Didn’t know it existed. Have you used it? I would say the most important difference is that HM has a much prettier API than XM. I like that it allows default values. I might have to add that to HM. It hasn’t been updated in a few years which says dead project to me.

    Another difference is that it appears to be built on REXML, whereas HM is built on libxml-ruby, so HM should parse the xml quite a bit faster unless XM uses REXML stream parsing, which I doubt.

    Those are my off the cuff reactions. :)

  14. Cool I just finished a tiny sinatra project making ruby objects from XML or HTTP responses, and it was a PITA. I used hpricot. I think using HappyMapper would have made it easier, but I’m glad I did the raw object creation at least once though too.

  15. John:

    Loving the plugin. Quick question::: I have a chunk of XML that looks something like this:

    A Title This is feature text This is feature text

    The question is, what is the best way to access the items? Right now I can get the attribute, or one of the feature items, but not all of them. Thoughts?

  16. @Kevin – No problem. See the paste here or the code below.

     require 'rubygems'
    gem 'happymapper', '0.1.1'
    require 'happymapper'
    require 'pp'
    
    xml = &lt;&lt;EOF
    &lt;products&gt;
      &lt;product&gt;
        &lt;title&gt; A Title&lt;/title&gt; 
        &lt;features_bullets&gt;
          &lt;feature&gt;This is feature text&lt;/feature&gt; 
          &lt;feature&gt;This is feature text&lt;/feature&gt; 
        &lt;/features_bullets&gt;
      &lt;/product&gt;
    &lt;/products&gt;
    EOF
    
    class FeatureBullet
      include HappyMapper
      
      tag 'features_bullets'
      element :feature, String
    end
    
    class Product
      include HappyMapper
      
      element :title, String
      has_many :features_bullets, FeatureBullet
    end
    
    Product.parse(xml).each do |product|
      puts product.title
      product.features_bullets.each { |fb| puts "  - #{fb.feature}" }
    end
    
    # outputs:
    #  A Title
    #   - This is feature text 
  17. John:

    I copied / pasted your Pastie and as you say, it’s output is one of the feature items. While really cool, I am looking to get both of them, not just the one.

    Additionally, you are a rock star.

    - Kevin

  18. @Kevin – Congratulations! You found something HappyMapper cannot currently map. I’ll add it to lighthouse as a bug. Haha. Nice work.

  19. Per Melin Per Melin

    Nov 23, 2008

    John,

    Are you aware that there’s a much changed ROXML 2.0 out?

    http://ben.woosley.name/log/?p=45

  20. @Per Melin – Nope hadn’t seen the updates. I really don’t like the API of ROXML. xml_attribute, xml_text and ROXML::TAG_ARRAY seem too verbose to me.

  21. Per Melin Per Melin

    Nov 24, 2008

    John,

    The API is different now. If you quickly browse through the post at the link I gave it may look like nothing has changed, but that’s because he is giving before-and-after examples.

  22. I’ve have this weird xml that happymapper is not able to map!

    1239804823UA-4869764-1

    I tried to set the :tag=>"openSearch:startIndex",but i still get an awful error,but the problem seems to be directly with libxml!

    LibXML::XML::Error: Error: Invalid expression at :0.

  23. @alex – is that all the XML? Does the root element have a namespace?

  24. Alex Gregianin Alex Gregianin

    Dec 02, 2008

    Nope,thats not the whole thing.

    This is the complete XML

    <?xml version=“1.0” ?>

    4


    1
    4

    12345
    Pride and Prejudice
    4321
    UA-12345-1
    ga:4321

    12345 Pride and Prejudice 5555 UA-12345-2 ga:5555 54321 Jane Austen 2222 UA-54321-1 ga:2222 54321 Jane Austen 3333 UA-54321-2 ga:3333
  25. @Alex – Ok, thanks. I’ll put that on the todo list as a test case. I’m betting that libxml-ruby doesn’t like namespaced xml tags without a namespace.

  26. Alex Gregianin Alex Gregianin

    Dec 02, 2008

    I’m looking into it right now,may be i’ll get a fix for that :)

    Laters!

  27. Don Morrison Don Morrison

    Dec 06, 2008

    @John & @Kevin,

    I was looking at the mapping issue today and came up with something that works out of the box: pastie

    Check it out – oh, I also updated the lighthouse ticket.

    Take care.

Sorry, comments are closed for this article to ease the burden of pruning spam.

About

Authored by John Nunemaker (Noo-neh-maker), a programmer who has fallen deeply in love with Ruby. Learn More.

Projects

Flipper
Release your software more often with fewer problems.
Flip your features.