alerts – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 How to stop missing the good weekends https://blog.scraperwiki.com/2012/01/how-to-stop-missing-the-good-weekends/ https://blog.scraperwiki.com/2012/01/how-to-stop-missing-the-good-weekends/#comments Fri, 20 Jan 2012 09:27:12 +0000 http://blog.scraperwiki.com/?p=758215936 The BBC's Michael Fish presenting the weather in the 80s, with a ScraperWiki tractor superimposed over LiverpoolFar too often I get so stuck into the work week that I forget to monitor the weather for the weekend when I should be going off to play on my dive kayaks — an activity which is somewhat weather dependent.

Luckily, help is at hand in the form of the ScraperWiki email alert system.

As you may have noticed, when you do any work on ScraperWiki, you start to receive daily emails that go:

Dear Julian_Todd,

Welcome to your personal ScraperWiki email update.

Of the 320 scrapers you own, and 157 scrapers you have edited, we
have the following news since 2011-12-01T14:51:34:

Histparl MP list - https://scraperwiki.com/scrapers/histparl_mp_list :
  * ran 1 times producing 0 records from 2 pages
  * with 1 exceptions, (XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<!DOCTYP')

...Lots more of the same

This concludes your ScraperWiki email update till next time.

Please follow this link to change how often you get these emails,
or to unsubscribe: https://scraperwiki.com/profiles/edit/#alerts

The idea behind this is to attract your attention to matters you may be interested in — such as fixing those poor dear scrapers you have worked on in the past and are now neglecting.

As with all good features, this was implemented as a quick hack.

I thought: why design a whole email alert system, with special options for daily and weekly emails, when we already have a scraper scheduling system which can do just that?

With the addition of a single flag to designate a scraper as an emailer (plus a further 20 lines of code), a new fully fledged extensible feature was born.

Of course, this is not counting the code that is in the Wiki part of ScraperWiki.

The default code in your emailer looks roughly like so:

import scraperwiki
emaillibrary = scraperwiki.utils.swimport("general-emails-on-scrapers")
subjectline, headerlines, bodylines, footerlines = emaillibrary.EmailMessageParts("onlyexceptions")
if bodylines:
    print "n".join([subjectline] + headerlines + bodylines + footerlines)

As you can see, it imports the 138 lines of Python from general-emails-on-scrapers, which I am not here to talk about right now.

Using ScraperWiki emails to watch the weather

Instead, what I want to explain is how I inserted my Good Weather Weekend Watcher by polling the weather forecast for Holyhead.

My extra code goes like this:

weatherlines = [ ]
if datetime.date.today().weekday() == 2:  # Wednesday
    url = "http://www.metoffice.gov.uk/weather/uk/wl/holyhead_forecast_weather.html"
    html = urllib.urlopen(url).read()
    root = lxml.html.fromstring(html)
    rows = root.cssselect("div.tableWrapper table tr")
    for row in rows:
        #print lxml.html.tostring(row)
        metweatherline = row.text_content().strip()
        if metweatherline[:3] == "Sat":
            subjectline += " With added weather"
            weatherlines.append("*** Weather warning for the weekend:")
            weatherlines.append("   " + metweatherline)
            weatherlines.append("")

What this does is check if today is Wednesday (day of the week #2 in Python land), then it parses through the Met Office Weather Report table for my chosen location, and pulls out the row for Saturday.

Finally we have to handle producing the combined email message, the one which can contain either a set of broken scraper alerts, or the weather forecast, or both.

if bodylines or weatherlines:
    if not bodylines:
        headerlines, footerlines = [ ], [ ]   # kill off cruft surrounding no message
    print "n".join([subjectline] + weatherlines + headerlines + bodylines + footerlines)

The current state of the result is:

*** Weather warning for the weekend:
  Mon 5Dec
  Day

  7 °C
  W
  33 mph
  47 mph
  Very Good

This was a very quick low-level implementation of the idea with no formatting and no filtering yet.

Email alerts can quickly become sophisticated and complex. Maybe I should only send a message out if the wind is below a certain speed. Should I monitor previous days’ weather to predict whether the sea will be calm? Or I could check the wave heights on the off-shore buoys? Perhaps my calendar should be consulted for prior engagements so I don’t get frustrated by being told I am missing out on a good weekend when I had promised to go to a wedding.

The possibilities are endless and so much more interesting than if we’d implemented this email alert feature in the traditional way, rather than taking advantage of the utterly unique platform that we happened to already have in ScraperWiki.

]]>
https://blog.scraperwiki.com/2012/01/how-to-stop-missing-the-good-weekends/feed/ 1 758215936
Knight Foundation finance ScraperWiki for journalism https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/ https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/#comments Wed, 22 Jun 2011 19:22:25 +0000 http://blog.scraperwiki.com/?p=758215012 ScraperWiki is the place to work together on data, and it is particularly useful for journalism.

We are therefore very pleased to announce that ScraperWiki has won the Knight News Challenge!

The Knight Foundation are spending $280,000 over 2 years for us to improve ScraperWiki as a platform for journalists, and to run events to bring together journalists and programmers across the United States.

America has trailblazing organisations that do data and journalism well already – for example, both ProPublica and the Chicago Tribune have excellent data centers to support their news content. Our aim is to lower the barrier to entry into data driven journalism and to create (an order of magnitude) more of this type of success. So come join our campaign for America: Yes We Can (Scrape).  PS: We are politically neutral but think open source when it comes to campaign strategy!

What are we going to do to the platform?

As well as polishing ScraperWiki to make it easier to use, and creating journalism focussed tutorials and screen casts, we’re adding four specific services for journalists:

  • Data embargo, so journalists can keep their stories secret until going to print, but publish the data in a structured, reusable, public form with the story.
  • Data on demand service. Often journalists need the right data ordered quickly, we’re going to create a smooth process so they can get that.
  • News application hosting. We’ll make it scalable and easier.
  • Data alerts. Automatically get leads from changing data. For example, watch bridge repair schedules, and email when one isn’t being maintained.

Here are two concrete examples of ScraperWiki being used already in similar ways:

Where in the US are we going to go?

What really matters about ScraperWiki is the people using it. Data is dead if it doesn’t have someone, a journalist or a citizen, analysing it, finding stories in it and making decisions from it.

We’re running Data Journalism Camps in each of a dozen states. These will be similar in format to our hacks and hackers hack days, which we’ve run across the UK and Ireland over the last year.

The camps will have two parts.

  • Making something. In teams of journalists and coders, using data to dig into a story, or make or prototype a news app, all in one day.
  • Scraping tutorials. For journalists who want to learn how to code, and programmers who want to know more about scraping and ScraperWiki.

This video of our event in Liverpool gives a flavour of what to expect.

Get in touch if you’d like us to stop near you, or are interested in helping or sponsoring the camps.

Finally…

The project is designed to be financially stable in the long term. While the public version of ScraperWiki will remain free, we will charge for extra services such as keeping data private, and data on demand. We’ll be working with B2B media, as well as consumer media.

As all Knight financed projects, the code behind ScraperWiki is open source, so newsrooms won’t be building a dependency on something they can’t control.

For more details you can read our original application (note that financial amounts have changed since then).

Finally, and most importantly, I’d like to congratulate and thank everyone who has worked on, used or supported ScraperWiki. The Knight News Challenge had 1,600 excellent applications, so this is a real validation of what we’re doing, both with data and with journalism.

]]>
https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/feed/ 7 758215012
Be alert! Your scrapers need alerts https://blog.scraperwiki.com/2011/01/be-alert-your-scrapers-need-lerts/ https://blog.scraperwiki.com/2011/01/be-alert-your-scrapers-need-lerts/#comments Mon, 31 Jan 2011 11:24:03 +0000 http://blog.scraperwiki.com/?p=758214208 It’s important to know when your scrapers have stopped working, so you can fix them.

And if someone else makes a change to one of your scrapers, you need to know, so you can check it’s OK and thank them.

Over the next day or two, if you have made or contributed to a scraper on ScraperWiki, you’ll start to see emails like this.

They happen once a day. If that’s too much, there’s a link at the bottom so you can unsubscribe on your profile page.



We’ve been testing this in the team for a couple of weeks, but I’m sure you’ll have suggestions and ideas for improving it. Let us know!

]]>
https://blog.scraperwiki.com/2011/01/be-alert-your-scrapers-need-lerts/feed/ 1 758214208