Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.

Blog

Scraping guides: Parsing HTML using CSS selectors

We’ve added a new scraping copy-and-paste guide, so you can quickly get the lines of code you need to parse an HTML file using CSS selectors. Get to it from the documentation page:

The HTML parsing guide is available in Ruby, Python and PHP. Just as with all documentation, you can choose which at the top right of the page.

While the library used varies (lxml in Python, Nokogiri in Ruby, Simple HTML DOM in PHP), the principle is the same. You pull the text out of the page the way as you use CSS to style a page.

It’s a popular technique – for example, around 30% of Python scrapers on ScraperWiki use lxml.

Tags: , , , ,

2 Responses to “Scraping guides: Parsing HTML using CSS selectors”

  1. Mortimer October 4, 2011 at 12:36 pm #

    To do something with the data, I just added a View guide for google visualization: https://views.scraperwiki.com/run/google_simple_graph_copypaste/

Trackbacks/Pingbacks

  1. xhtml css templates – Scraping guides: Parsing HTML using CSS selectors | ScraperWiki … | XHTML CSS - Style sheet and html programming tutorial and guides - October 4, 2011

    […] post: Scraping guides: Parsing HTML using CSS selectors | ScraperWiki … Share and […]

We're hiring!