Category Archives: Data

BetaTales: Can the story of traffic accidents be told in a new way?

BetaTales takes a look at a new project based on traffic accident data from journalists and programmers at Norwegian media house Bergens Tidende.

Accidents are apparently common fare in the Western part of Norway, with frequent news reports of collisions on the region’s narrow, winding roads.

With this in mind, journalists at Bergens Tidende approached the Norwegian Public Roads Administration armed with the Freedom of Information Act, eventually getting access to a database of all road accidents in the country.

The database turned out to be a journalistic goldmine: It contained details about 11,400 traffic accidents all over the country, all neatly arranged in an Excel file. Not only did the database give the exact position of each accident, but it also included numerous details, such as how many were killed and injured, the seriousness of injuries, driving conditions, type of vehicle, type of street, speed limit, time of the day, etc.

Still, most journalists would at this point probably have been happy to take a look at the database, extract some of the relevant accidents and made a couple of news stories based on them. In Bergens Tidende, though, the journalists instead were teamed up with programmers. Within a few weeks all the traffic accidents in the country had been put on a big Google map with endless ways to search the database.

Full story on BetaTales at this link.

“Killing Roads” project from Bergens Tidende at this link (Norwegian).

Bergens Tidende multimedia journalist Lasse Lambrechts talks about “Killing Roads”:

A look at the Guardian Hacks SXSW event

The Guardian played host to designers, developers and journalists at the weekend for its “Guardian Hacks SXSW” event. (The raw data reveals that there were 82 developers, 12 girls and 12 ‘full beards’, among other things.)

Guardian information architect Martin Belam takes a look at some of the day’s hacks on his blog:

The hack that appeared to draw the most gasps from the assembled journalists in the room, and consequently won, was Articlr, which was presented by Jason Grant. It was a back-end tool for easily monitoring social media and rival coverage of a story in real-time, and then simply dragging-and-dropping elements from external sites into a story package. With a bit of geo-location goodness thrown in. I fully expect the feature request to be on my Guardian desk by about 11am this morning…

Plus you can see full coverage from the Guardian at this link and related Twitter goings on using the #gsxsw hashtag.

Jonathan Stray: A computational journalism reading list

Journalist and computer scientist Jonathan Stray has posted an interesting breakdown of what he calls “computational journalism”, a kind of parent term for data journalism, visualisation, computational linguistics, communications technology, filtering, research and more.

I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of “reporting” — because information not published is information not known — but my definition is intentionally much broader than that.

Stray has put together a reading list under each sub-header (including our very own ‘How to: get to grips with data journalism‘).

Worth a read.

Full post on Jonathan Stray’s blog at this link.

Channel 4 News: Benjamin Cohen’s life torn open by Wired

Benjamin Cohen, technology editor at Channel 4 News, has blogged about the experience of being sent the latest, personalised edition of Wired magazine.

Well, personalised for some. “Opinion formers” around the UK have been sent a copy of Wired, titled “Your life torn open”, with personal information about them splashed over the front cover. Cohen was shocked by the information that they printed – and it is shocking at first. But then it is all publically available through Facebook, Twitter, Companies House and the Land Registry.

What’s shocking though is seeing all of this printed in black and white (or yellow in this case). Everything was available from Facebook, Twitter, Company House and the Land Registry but it shows the information is so readily available. It also shows how powerful these resources can be for private detectives or government agents.

Read his post in full here…

OUseful: New public data Q&A site launches

Open University lecturer, self-proclaimed mashup artist and all-round bright spark Tony Hirst blogs about a new Q&A site designed to help people with open data questions.

GetTheData.org is in “startup/bootstrapping” phase at the moment but already has a fair bit of information up.

The idea behind the site is to field questions and answers relating to the practicalities of working with public open data: from discovering data sets, to combining data from different sources in appropriate ways, getting data into formats you can happily work with, or that will play nicely with visualisation or analysis tools you already have, and so on.

Full post on OUseful.info at this link.

h/t: Online Journalism Blog

Martin Belam: The death of RSS? Not at the Guardian

In this post on his Currybet.net blog Martin Belam responds to discussions about the future of RSS feeds. While feeds may remain a niche tool, the latest CMS release at the Guardian, where Belam works as an information architect, sees links to RSS feeds made much more easy to find, he says.

Previously we didn’t automatically link to an RSS feed from an individual article page. This was because articles could ‘belong’ to various different areas of the site, and so it wasn’t always obvious which RSS feed should be chosen as the parent. This blog post of mine, for example, ‘appeared’ on the Open Platform blog, the Datablog, and in the Technology and Politics sections.

We’ve just changed that in release 103 of our CMS, in response to a request on our new Developer Blog. Now in the <HEAD> of our articles you’ll get an auto-discovery link to all of the related keyword feeds.

#cablegate: WikiLeaks essential to a strong media, Assange argues in new op-ed

Just hours after the arrest of Julian Assange in London, the Australian has published an op-ed piece by the WikiLeaks founder in which he places the organisation squarely among the media firmament:

“Democratic societies need a strong media and WikiLeaks is part of that media”, argues Assange. “The media helps keep government honest. WikiLeaks has revealed some hard truths about the Iraq and Afghan wars, and broken stories about corporate corruption.”

The piece begins with a quote from a young Rupert Murdoch, who said in 1958: “In the race between secrecy and truth, it seems inevitable that truth will always win.” A particularly poignant statement, given that WikiLeaks is now in the fight of its life: trying desperately to stay online amid sustained cyber attacks; facing possible prosecution under any law the US attorney general can find to fit the bill; and press coverage of the leaks diverted by the arrest of its founder and editor-in-chief for alleged sex crimes.

The attacks on WikiLeaks have come thick and fast from many fronts, but, as Assange points out in his op-ed, the newspapers that published secret diplomatic cables by its side are not suffering anything like the same treatment:

WikiLeaks is not the only publisher of the US embassy cables. Other media outlets, including Britain’s the Guardian, the New York Times, El Pais in Spain and Der Spiegel in Germany have published the same redacted cables. Yet it is WikiLeaks, as the co-ordinator of these other groups, that has copped the most vicious attacks and accusations from the US government and its acolytes.

Assange goes on to claim that his organisation has coined “a new type of journalism”, which he calls “scientific journalism”.

We work with other media outlets to bring people the news, but also to prove it is true. Scientific journalism allows you to read a news story, then to click online to see the original document it is based on. That way you can judge for yourself: is the story true? Did the journalist report it accurately?

His call for journalism to adopt something more akin to a scientific method are not new. It echoes comments he made back in July, prior to the release of Afghanistan and Iraq war logs and the US embassy cables:

You can’t publish a paper on physics without the full experimental data and results, that should be the standard in journalism. You can’t do it in newspapers because there isn’t enough space, but now with the internet there is.

As he has done for many years in defence of his own organisation, Assange raises the issue of the Pentagon Papers as he closes his piece:

In its landmark ruling in the Pentagon Papers case, the US Supreme Court said “only a free and unrestrained press can effectively expose deception in government”. The swirling storm around WikiLeaks today reinforces the need to defend the right of all media to reveal the truth.

See the full article on the Australian at this link…

#cablegate: 7,500 cables tagged ‘PR and Correspondence’ could shed light on media relations

According to WikiLeaks, there are more than 7,500 embassy cables due to be released as part of its latest classified documents leak that have the tag OPRC or “Public Relations and Correspondence”.

Only two with these tag have been published so far – one is a round-up of Turkish media reaction and the other a summary of media reaction to news issues in China, the US and Iran, both sent in 2009.

But it’ll be worth keeping an eye on future cables tagged OPRC for information about diplomats and country leaders’ media relations and communications.

Until the text of these cables is made public, we don’t know just what they contain and how relevant it might be to media outlets. But using the Guardian’s data store of the cables, it’s easy to find out how many cables have been sent by which embassies during the time period covered by the leak –

The US embassy in Ankara, Turkey is responsible for the largest number of cables tagged OPRC, 1,551, while the American Institute Taiwan in Taipei is behind 1,026 of them. Seventy-five embassies have sent 10 or fewer OPRC-tagged cables.

#cablegate: The Guardian on the importance of the WikiLeaks embassy cables leak

As WikiLeaks begins publication of more than 250,000 diplomatic cables sent by US embassies around the world, the Guardian, which is one of a group of media organisations publishing a selection (a few hundred) of the cables in partnership with the whistleblowing site, has produced the video below, explaining the significance of the leak:

Video: US embassy leaks: ‘The data deluge is coming …’ | World news | guardian.co.uk.

Currybet: What open government data giveth, closed state data taketh away

Government information architect Martin Belam has an interesting post about some of the limitations of the recent government data release, particularly the difficulty of – and cost associated with – cross-referencing the data with Companies House records.

Using the Guardian’s data explorer tool, you can get a comprehensive list of suppliers. Wouldn’t it be wonderful if you could instantly cross-reference that with the records at Companies House?

I’d love to be able to get an instant snapshot of how many of these companies are large, medium or small enterprises. Over time you could use that to measure whether the intention to open up Government service tendering to wider competition was on track or not.

Full post at this link…