Tag Archives: tony hirst

OUseful: New public data Q&A site launches

Open University lecturer, self-proclaimed mashup artist and all-round bright spark Tony Hirst blogs about a new Q&A site designed to help people with open data questions.

GetTheData.org is in “startup/bootstrapping” phase at the moment but already has a fair bit of information up.

The idea behind the site is to field questions and answers relating to the practicalities of working with public open data: from discovering data sets, to combining data from different sources in appropriate ways, getting data into formats you can happily work with, or that will play nicely with visualisation or analysis tools you already have, and so on.

Full post on OUseful.info at this link.

h/t: Online Journalism Blog

Government spending: Who’s doing what with the new data?

Today sees the biggest release of government spending data in history. Government departments have published details of all spending over £25,000 for the past six months and, according to this morning’s announcement, will continue to publish this expenditure data on a monthly basis.

According to minister for the Cabinet Office and paymaster general Francis Maude, it is part of a drive “to make the UK the most transparent and accountable government in the world”.

We’ve already released a revolutionary amount of data over the last six months, from the salaries of the highest earning civil servants to organisation structure charts which give people a real insight into the workings of government and is already being used in new and innovative ways.

A huge amount of public spending data has indeed been published under the current government, and today’s release is a significant addition to that. So who is doing what with the vast amount of new data? And who is making it easier for others to crunch the numbers?

The Guardian is usually streets ahead of other newspapers in processing large datasets and today’s coverage is no exception:

Who else?

There are, of course, different ways of looking at the numbers, as one Guardian commenter, LudwigsLughole, highlights:

There are 90,000 HMRC staff. They spent £164,000 in six months on bottled spring water. That equates to an annual spend per head of only £3.64. So the FT are seriously suggesting that £3.64 per head to give staff fresh bottled water is excessive? Pathetic journalism.

Exploring the data yourself

“The biggest issue with all these numbers is, how do you use them? If people don’t have the tools to interrogate the spreadsheets, they may as well be written in Latin.” – Simon Rogers, Guardian Data Blog editor.

“Releasing data is all well and good, but to encourage the nation’s ‘armchair auditors’, it must be readily usable.” – Martin Stabe, FT.

Here are some of the places you can go, along with the Guardian, to have a crack at the numbers yourself. Please add your own suggestions in the comments below.

Lots and lots of data. So what? My take on it was to find a quick and dirty way to cobble a query interface around the data, so here’s what I spent an hour or so doing in the early hours of last night, and a couple of hours this morning… tinkering with a Gov spending data spreadsheet explorer:

Guardian/gov datastore explorer

[T]he real power of this data will become clear in the months to come, as developers and researchers – you? – start to link it to other information, like the magisterial OpenlyLocal and the exciting WhosLobbying. Please make use of our API and loading scripts to do so.

Also see the good suggestions on Where Does My Money Go? for how government data publishing might be improved in the future.

So, coming full circle I return to the Guardian, and to the data-minded Simon Rogers, who asks: Will the government spending data really change the world?

A big question. Feel free to add your opinion below and any other data projects you have seen today or that pop up in the future.

#FollowJourn: @psychemedia/lecturer and blogger

#FollowJourn: Tony Hirst

Who? Open University lecturer/blogger/data pioneer.

What? Hirst keeps a regular blog about his work at the Open University and was one of the first contributors to the Guardian’s Open Platform project.

Where? @psychemedia on Twitter/Ouseful.info

Contact? a.j.hirst [at] open.ac.uk

Just as we like to supply you with fresh and innovative tips every day, we’re recommending journalists to follow online too. They might be from any sector of the industry: please send suggestions (you can nominate yourself) to judith or laura at journalism.co.uk; or to @journalismnews.

Signals intelligence journalism: using public information websites to source stories

Useful information is more widely and easily available than ever and the increasing amount of online data released by the government and others can help improve the originality of journalists’ work.

Look to VentnorBlog – the hyperlocal online effort based in the Isle of Wight which Journalism.co.uk commended during the Vestas protest coverage – for some inspiration.

[For those unfamiliar with the story, locals had been protesting against the closure of the wind turbine factory in front of national, local and hyperlocal media. Despite a long and well-publicised campaign in August 2009, Danish company Vestas has now pulled out of manufacturing on the Isle of Wight but protests and attacks by critics in the press continue. A national day of action to support redundant Vestas workers has been planned for Thursday, September 17.]

Last week, using the Area Ship Traffic Website, AIS, VB was able to report where two barges held by an agent – NEG  Micron Rotors – who used to own the Vestas’ factory were due to head. They would be used to move the blades from the factory, which are so huge that they can only travel away on the water on special vessels.

The correspondent who tipped off VentnorBlog knew that the wind turbine blades can only be transferred from the riverside to barge when it is high tide and across a public footpath so, using the information on the AIS site, concluded that the barges would be moved in a specific time slot.

As a result Vestas protesters asked supporters to join them at the Marine Gate on the River Medina. Of course VentnorBlog got down there to take some pictures.

Now let’s take that one step further: how can journalists tap into this kind of publicly available data to scoop stories?

Tony Hirst, Open University academic, Isle of Wight resident and prolific data masher, shared some thoughts with Journalism.co.uk. He said that we should look to signals intelligence for further inspiration: the interception and analysis of ‘signals’ emitted by whoever you are surveying. As military historians would be the first to tell you, they can be a very rich source of intelligence about others’ actions and intentions, he explained.

“A major component of SIGINT is COMINT, or Communications Intelligence, which focuses on the communications between parties of interest. Even if communications are encrypted, Traffic Analysis, or the study of who’s talking to whom, how frequently, at what time of day, or  – historically – in advance of what sort of action, can be used to learn about the intentions of others.”

And this is relevant to journalists, he added:

“For starters, data is information, or raw intelligence. The job of the analyst, or the data journalist, is to identify signals in that information in order to identify something of meaning – ‘intelligence’ about intentions, or ‘evidence’ for a particular storyline.

The VentnorBlog story, he said, describes how a ‘sharp-eyed follower of movements at the plant’ knew where two barges were headed and at what time – valuable journalistic information:

“Amid the mess of Solent shipping information was a meaningful signal relating to the Vestas story – the movement of the barge that takes wind turbine blades from the Vestas factory on the Isle of Wight to the mainland.”

Do you have suggestions for sources of ‘signals intelligence’ journalism? Or examples of where it has been done well?

VentnorBlog shows us high-quality hyperlocal reporting with the Vestas story

Remembering a little comparative exercise that Tony Hirst undertook on the OUseful.info blog during the MPs’ expenses revelations, Journalism.co.uk thought it might be illuminating to re-visit Isle of Wight news production on the day of the Vestas case. How did hyperlocal site, the VentnorBlog – not just about the town of Ventnor – treat the Vestas story in comparison to the Isle of Wight County Press Online (in print, it’s weekly) and the national press?

Today’s court adjournment that saw the Danish owners of wind turbine company, Vestas, unable to force workers out of Isle of Wight factory. For the past nine days, about 20 workers have occupied the Vestas Wind Systems plant near Newport, which is due to close on Friday (around 625 workers are set to lose their jobs) but a possession order made at Newport county court today has been delayed until next week, as the company had not properly served papers on the individuals in the building, and the hearing took place prematurely [sources: the Guardian / VentnorBlog].

1. The Guardian

News report and video at this link. Blog post on ‘Vestival’. Other news content from earlier in the day and the week gathered at this link.

2. The Isle of Wight County Press.

A story reporting that ‘Judge denies Vestas eviction order’. The other news link takes us to other related stories, the last of which was printed Tuesday.

3. The VentnorBlog:

Rolling news, updated throughout the day. Eleven updates, lending themselves well to re-tweets (like Journalism.co.uk, the blog uses the TweetMeme button on its posts), posted since this morning including:

  • Video news content. Eg. this segment, with the announcement outside court:

Vestas sit-in: Case Postponement Announced To The Crowd from Ventnor Blog on Vimeo.

Previous coverage of the Vestas story on the VentnorBlog can be found at this link. NB: the VentnorBlog published its Vestival story yesterday lunchtime.

A comment left by ‘Eco T’ on the VentnorBlog is just one of the positive reactions to VB’s coverage:

“I would like to say that Ventnor Blogs coverage has been second to none. The most detailed and accurate report of any news service. I would like to thank ever one at Ventnor blog and hope you keep up the great work.”

The coverage is bitty (as you might expect as a story unfolds) and not necessarily completely balanced (most updates focus on the workers’ perspective), but the VentnorBlog has done an excellent job of providing the islanders (and outsiders) with raw and useful material, showing us how high-quality hyperlocal reporting is done.

OUseful: Gripes with Guardian’s DataStore #datajourn

Here are thoughts from Tony Hirst, one of the first adopters and success stories for the Guardian’s Open Platform, on what the OP’s DataStore is and is not doing, in terms of data curation (or gardening). He asks:

“Is the Guardian DataStore adding value to the data in the data store in an accessibility sense: by reducing the need for data mungers to have to process the data, so that it can be used in a plug’n’play way by the statisticians and the data visualisers, whether they’re professionals, amateurs or good old Jo Public?”

Hirst has a number of queries in regards to data quality and ‘misleading’ linking on the Guardian DataBlog. In a later comment, he wonders whether there is a ‘data style guide’ available yet.

If you’re not all that au fait with the data lingo, this post might be a bit indigestible, so we’ll follow with a translation in coming days.

Related on Journalism.co.uk: Q&A with Hirst, April 8, 2009.

#DataJourn part 2: Q&A with ‘data juggler’ Tony Hirst

As explained in part one of today’s #datajourn conversation, Tony Hirst is the ‘data juggler’ (as titled by Guardian tech editor Charles Arthur) behind some of the most interesting uses of the Guardian’s Open Platform (unless swear words are your thing – in which case check out Tom Hume’s work)

Journalism.co.uk sent OU academic, mashup artist and Isle of Wight resident, Tony Hirst, some questions over. Here are his very comprehensive answers.

What’s your primary interest in – and motivation for – playing with the Guardian’s Open Platform?
TH: Open Platform is a combination of two things – the Guardian API, and the Guardian Data store. My interest in the API is twofold: first, at the technical level, does it play nicely with ‘mashup tools’ such as yahoo pipes, Google spreadsheet’s =importXML formula, and so on; secondly, what sort of content does it expose that might support a ‘news and learning’ mashup site where we can automatically pull in related open educational resources around a news story to help people learn more about the issues involved with that story?

One of the things I’ve been idling about lately is what a ‘university API’ might look at, so the architecture of the Guardian API, and in particular the way the URIs that call on the API, are structured is of interest in that regard (along with other APIs, such as the New York Times’ APIs, the BBC programmes’ API, and so on).

The data blog resources – which are currently being posted on Google spreadsheets – are a handy source of data in a convenient form that I can use to try out various ‘mashup recipes’. I’m not so interested in the data as is, more in the ways in which it can be combined with other data sets (for example, in Dabble DB) and or displayed using third party visualisation tools. What inspires me is trying to find ‘mashup patterns’ that other people can use with other data sets. I’ve written several blog posts showing how to pull data from Google spreadsheets in IBM’s Many Eyes Wikified visualisation tool: it’d be great if other people realised they could use a similar approach to visualise sets of data I haven’t looked at.

Playing with the actual data also turns up practical ‘issues’ about how easy it is to create mashups with public data. For example, one silly niggle I had with the MPs’ expenses data was that pound signs appeared in many of the data cells, which meant that Many Eyes Wikified, for example, couldn’t read the amounts as numbers, and so couldn’t chart them. (In fact, I don’t think it likes pound signs at all because of the character encoding!) Which meant I had to clean the data, which introduced another step in the chain where errors could be introduced, and which also raised the barrier to entry for people wanting to use the data directly from the data store spreadsheet. If I can help find some of the obstacles to effective data reuse, then maybe I can help people publish their data in way that makes it easier for other people to reuse (including myself!).

Do you feel content with the way journalists present data in news stories, or could we learn from developers and designers?
TH: There’s a problem here in that journalists have to present stories that are: a) subject to space and layout considerations beyond their control; and b) suited to their audience. Just publishing tabulated data is good in the sense that it provides the reader with evidence for claims made in a story (as well as potentially allowing other people to interrogate the data and maybe look for other interpretations of it), but I suspect is meaningless, or at least of no real interest, to most people. For large data sets, you wouldn’t want to publish them within a story anyway.

An important thing to remember about data is that it can be used to tell stories, and that it may hide a great many patterns. Some of these patterns are self-evident if the data is visualised appropriately. ‘Geo-data’ is a fine example of this. It’s natural home is on a map (as long as the geo-coding works properly, that is (i.e. the mapping from location names, for example, to latitude/longitude co-ordinates than can be plotted on a map).

Finding ways of visualising and interacting data is getting easier all the time. I try to find mashup patterns that don’t require much, if any, writing of computer programme code, and so in theory should be accessible to many non-developers. But it’s a confidence thing: and at the moment, I suspect that it is the developers who are more likely to feel confident taking data from one source, putting it into an application, and then providing the user with a simple user interface that they can ‘just use’.

You mentioned about ‘lowering barriers to entry’ – what do you mean by that, and how is it useful?

TH: Do you write SQL code to query databases? Do you write PHP code parse RSS feeds and filter out items of interest? Are you happy writing Javascript to parse a JSON feed, or would rather use XMLHTTPRequest and a server side proxy to pull in an XML feed into a web page and get around the domain security model?

Probably none of the above.

On the other hand, could you copy and paste a URL to a data set into a ‘fetch’ block in a Yahoo pipe, identify which data element related to a place name so that you could geocode the data, and then take the URL of the data coming out from the pipe and paste it into the Google maps search box to get a map based view of your data? Possibly…

Or how about taking a spreadsheet URL, pasting it into Many Eyes Wikified, choosing the chart type you wanted based on icons depicting those chart types, and then selecting the data elements you wanted to plot on each axis from a drop down menu? Probably…

What kind of recognition/reward would you like for helping a journalist produce a news story?
TH: A mention for my employer, The Open University, and a link to my personal blog, OUseful.info. If I’d written a ‘How To’ explanation describing how a mashup or visualisation was put together, a link to that would be nice too. And if I ever met the journalist concerned, a coffee would be appreciated! I also find it valuable knowing what sorts of things journalists would like to be able to do with the technology that they can’t work out how to do. This can feed into our course development process, identifying the skills requirements that are out there, and then potentially servicing those needs through our course provision. There’s also the potential for us to offer consultancy services to journalists too, producing tools and visualisations as part of a commercial agreement.

One of the things my department is looking at at the moment is a revamped website. it’s a possibility that I’ll start posting stories there about any news related mashups I put together, and if that is the case, then links to that content would be appropriate. This isn’t too unlike the relationship we have with the BBC, where we co-produce televlsion and radio programmes and get links back to supporting content on OU websites from BBC website, as well as programme credits. For example, I help pull together the website around the BBC World Service programme Digital Planet, which we co-produce every so often. which gets a link from the World Service website (as well as the programme’s Facebook group!), and the OU gets a mention in the closing credits. The rationale behind this approach is getting traffic to OU sites, of course, where we can then start to try to persuade people to sign up for related courses!

#DataJourn part 1: a new conversation (please re-tweet)

Had it not been published at the end of the workday on a Friday, Journalism.co.uk would have made a bit more of a song-and-dance of this story, but as a result it instead it got reduced to a quick blog post. In short: OU academic Tony Hirst produced a rather lovely map, on the suggestion (taunt?) of the Guardian’s technology editor, Charles Arthur, and the result? A brand new politics story for the Guardian on MPs’ expenses.

Computer-assisted reporting (CAR) is nothing new, but innovations such as the Guardian’s launch of Open Platform, are leading to new relationships and conversations between data/stats experts, programmers and developers, (including the rarer breed of information architects), designers, and journalists – bringing with them new opportunities, but also new questions. Some that immediately spring to mind:

  • How do both parties (data and interactive gurus and the journalists) benefit?
  • Who should get credit for new news stories produced, and how should developers be rewarded?
  • Will newsrooms invest in training journalists to understand and present data better?
  • What problems are presented by non-journalists playing with data, if any?
  • What other questions should we be asking?

The hashtag #datajourn seems a good one with which to kickstart this discussion on Twitter (Using #CAR, for example, could lead to confusion…).

So, to get us started, two offerings coming your way in #datajourn part 2 and 3.

Please add your thoughts below the posts, and get in touch with judith@journalism.co.uk (@jtownend on Twitter) with your own ideas and suggestions for ways Journalism.co.uk can report, participate in, and debate the use of CAR and data tools for good quality and ethical journalism.

MPs’ travel expenses disparity highlighted by Guardian Open Platform projects

Tony Hirst, the independent developer who launched some of the first projects using the Guardian’s Open Platform, has again used the Data Store in an innovative way – leading to a new story about MPs’ expenses for the Guardian.

Hirst’s use of Google Maps shows that there are differences of up to £20,000 in neighbouring MPs’ travel expenses.

Hirst describes his work here which he developed after he discovered that the expenses data was being released via Data Store. Guardian technology editor Charles Arthur wrote about Hirst’s initial efforts and said “what we need now is a dataset which shows constituency distances from Westminster, so that we can compare that against travel…”

Hirst clearly couldn’t resist the challenge.