Writing on openDemocracy, Nicola Hughes, who is also known as DataMinerUK, has questioned what the use of the term ‘hack’ and its related synonyms mean for journalists following the News of the World phone-hacking scandal.
Hughes explains how journalists scrape data.
The people who are part of this community (I flatter myself to be included) are ‘hackers’ by the best definition of the word. The web allows anyone to publish their code online so these people are citizen hackers. They are the creators of such open civic websites as Schooloscope, Openly Local, Open Corporates, Who’s Lobbying, They Work For You, Fix My Street, Where Does My Money Go? and What Do They Know? This is information in the public interest. This is a new subset of journalism. This is the web enabling civic engagement with public information. This is hacking. But, unlike other fields of citizen journalism, it requires a very particular set of skills.
Hughes goes on to explain how journalists “need to get to grips with data to get the public their answers” and ends with a plea saying the News of the World affair should not define ‘hacking’.
In the Shakespearean sense of “That which we call a rose by any other word would smell as sweet”, we should define journalism not by a word but by what it smells like. Something stank about the initial inquiry into the News of the World. Nick Davies smelled it and followed his nose. And that’s the definition of journalism.
The full post is at this link
The excellent Nicola Hughes, author of the Data Miner UK blog, has a very practical post up about how she scraped and cleaned up some very messy Cabinet Office spending data.
Firstly, I scraped this page to pull out all the CSV files and put all the data in the ScraperWiki datastore. The scraper can be found here.
It has over 1,200 lines of code but don’t worry, I did very little of the work myself! Spending data is very messy with trailing spaces, inconsistent capitals and various phenotypes. So I scraped the raw data which you can find in the “swdata” tab. I downloaded this and plugged it into Google Refine.
And so on. Hughes has held off on describing “something interesting” that she has already found, focusing instead on the technical aspects of the process, but she has published her results for others to dig into.
Before I can advocate using, developing and refining the tools needed for data journalism I need journalists (and anyone interested) to actually look at data. So before I say anything of what I’ve found, here are my materials plus the process I used to get them. Just let me know what you find and please publish it!
See the full post on Data Miner UK at this link.
Nicola will be speaking at Journalism.co.uk’s news:rewired conference next week, where data journalism experts will cover sourcing, scraping and cleaning data along with developing it into a story.
Who? Nicola Hughes
Where? Nicola is a data journalist who has been blogging about data journalism at Data Miner UK, since 2010. She has worked with the digital team at CNN and joined Scraperwiki earlier this year. She will be speaking during the Sorting the Social Media Chaos session at Journalism.co.uk’s news:rewired – noise to signal event at Thomson Reuters in London, on 27 May.
Just as we like to supply you with fresh and innovative tips every day, we’re recommending journalists to follow online too. They might be from any sector of the industry: please send suggestions (you can nominate yourself) to sarah.booker at journalism.co.uk; or to @journalismnews.