Comments on: How to scrape and parse Wikipedia

By: Eddie

Eddie — Tue, 27 Nov 2012 15:53:38 +0000

Your wikiscrape script just saved me, needed to get a list of regions from counties in the UK and other databases I’ve found online kept letting me down.. Setup a small script in python with flask and ran through google-refine.. Very helpful!

By: The Data Hob | ScraperWiki Data Blog

The Data Hob | ScraperWiki Data Blog — Tue, 06 Mar 2012 04:00:21 +0000

[…] I would have loved to have derived it from the editable source of the wikipedia article, as I described elsewhere, but is impossible to do because it is insanely […]

By: Chris Davis

Chris Davis — Fri, 27 Jan 2012 08:32:30 +0000

Waiting 6 months for the latest DBpedia update isn’t a concern any more. They have a version that is being synchronized live with Wikipedia now – see http://live.dbpedia.org/.

I agree with the criticism about how semantic web applications sometimes assume far too clear boundaries between entities. However, this isn’t an inherent problem with DBpedia since they’ve basically outsourced the task of creating entity definitions to the Wikipedia community. The only way they would show these caves being connected together in one system was if people on Wikipedia said that they were.

In practical terms, I think that ScraperWiki can still be an awesome tool for scraping Wikipedia since the DBpedia parser does sometimes have problems parsing certain fields, and I don’t think they have very good support yet for parsing tables.

By: Julian

Julian — Thu, 08 Dec 2011 17:30:22 +0000

where is this file k2.py of yours?

scraperwiki.swimport() is a function in the library as described at the bottom of this page:
https://scraperwiki.com/docs/python/python_help_documentation/

By: Alice

Alice — Thu, 08 Dec 2011 16:47:25 +0000

Great article. But I’m facing some problem. Your first example is working fine. I have tried second one. But that one is not working. Here is traceback . Thank you.
File “k2.py”, line 2, in
wikipedia_utils = scraperwiki.swimport(“wikipedia_utils”)
AttributeError: ‘module’ object has no attribute ‘swimport’