July 11, 2006

Posted by John

Tagged screen scraping

Older: 19 Rails Tricks

Newer: Back from Vacation and New Links Feed

Scrapi: Really Easy Screen Scraping

This isn’t Rails specific, but wow! Scraping just got a whole lot easier with Labnotes new library named Scrapi. An example from the post on how to use it…

ebay_auction = Scraper.define do
    process "h3.ens>a", :description=>:text,
                        :url=>"@href"
    process "td.ebcPr>span", :price=>:text
    process "div.ebPicture >a>img", :image=>"@src"

    result :description, :url, :price, :image
end

ebay = Scraper.define do
    array :auctions

    process "table.ebItemlist tr.single",
            :auctions => ebay_auction

    result :auctions
end

auctions = ebay.scrape(html)

Yeah. It’s that easy and that cool. The code is stable and currently being used in co.mments, a production app that does a lot of scraping (it keeps track of comments you leave at other’s sites).

You can grabe the code through svn right now and it will soon be available as a gem.

0 Comments

Sorry, comments are closed for this article to ease the burden of pruning spam.

About

Authored by John Nunemaker (Noo-neh-maker), a programmer who has fallen deeply in love with Ruby. Learn More.

Projects

Flipper
Release your software more often with fewer problems.
Flip your features.