Tag Archives: Data journalism

#cablegate: 7,500 cables tagged ‘PR and Correspondence’ could shed light on media relations

According to WikiLeaks, there are more than 7,500 embassy cables due to be released as part of its latest classified documents leak that have the tag OPRC or “Public Relations and Correspondence”.

Only two with these tag have been published so far – one is a round-up of Turkish media reaction and the other a summary of media reaction to news issues in China, the US and Iran, both sent in 2009.

But it’ll be worth keeping an eye on future cables tagged OPRC for information about diplomats and country leaders’ media relations and communications.

Until the text of these cables is made public, we don’t know just what they contain and how relevant it might be to media outlets. But using the Guardian’s data store of the cables, it’s easy to find out how many cables have been sent by which embassies during the time period covered by the leak –

The US embassy in Ankara, Turkey is responsible for the largest number of cables tagged OPRC, 1,551, while the American Institute Taiwan in Taipei is behind 1,026 of them. Seventy-five embassies have sent 10 or fewer OPRC-tagged cables.

David Higgerson: ‘Actionable’ news and what it means for data journalism

David Higgerson blogs about the idea of ‘actionable’ news – a phrase that he first heard at last week’s Society of Editors conference from Reuters’ Jodie Ginsberg:

I see actionable news being right at the heart of the idea of data journalism. Information may well be freely available in a way we’ve never seen before, but that doesn’t mean the role of the storyteller has fallen by the wayside. As long as the writer who gets to grips with a spreadsheet of data is also plugged into the community they serve, and knows what they are interested in, then we’ve got actionable news (…) It shouldn’t be a revelation to journalists – after all, newsroom planners have certain data-rich days marked in every year, such as GCSE league tables day. But rather than be dictated to by a government planning calendar, journalists who can marry data access to issues which impact on people’s lives can provide make their work, and the titles they work for, more relevant to an audience than ever before.

Full post on David Higgerson’s blog at this link…

Government spending: Who’s doing what with the new data?

Today sees the biggest release of government spending data in history. Government departments have published details of all spending over £25,000 for the past six months and, according to this morning’s announcement, will continue to publish this expenditure data on a monthly basis.

According to minister for the Cabinet Office and paymaster general Francis Maude, it is part of a drive “to make the UK the most transparent and accountable government in the world”.

We’ve already released a revolutionary amount of data over the last six months, from the salaries of the highest earning civil servants to organisation structure charts which give people a real insight into the workings of government and is already being used in new and innovative ways.

A huge amount of public spending data has indeed been published under the current government, and today’s release is a significant addition to that. So who is doing what with the vast amount of new data? And who is making it easier for others to crunch the numbers?

The Guardian is usually streets ahead of other newspapers in processing large datasets and today’s coverage is no exception:

Who else?

There are, of course, different ways of looking at the numbers, as one Guardian commenter, LudwigsLughole, highlights:

There are 90,000 HMRC staff. They spent £164,000 in six months on bottled spring water. That equates to an annual spend per head of only £3.64. So the FT are seriously suggesting that £3.64 per head to give staff fresh bottled water is excessive? Pathetic journalism.

Exploring the data yourself

“The biggest issue with all these numbers is, how do you use them? If people don’t have the tools to interrogate the spreadsheets, they may as well be written in Latin.” – Simon Rogers, Guardian Data Blog editor.

“Releasing data is all well and good, but to encourage the nation’s ‘armchair auditors’, it must be readily usable.” – Martin Stabe, FT.

Here are some of the places you can go, along with the Guardian, to have a crack at the numbers yourself. Please add your own suggestions in the comments below.

Lots and lots of data. So what? My take on it was to find a quick and dirty way to cobble a query interface around the data, so here’s what I spent an hour or so doing in the early hours of last night, and a couple of hours this morning… tinkering with a Gov spending data spreadsheet explorer:

Guardian/gov datastore explorer

[T]he real power of this data will become clear in the months to come, as developers and researchers – you? – start to link it to other information, like the magisterial OpenlyLocal and the exciting WhosLobbying. Please make use of our API and loading scripts to do so.

Also see the good suggestions on Where Does My Money Go? for how government data publishing might be improved in the future.

So, coming full circle I return to the Guardian, and to the data-minded Simon Rogers, who asks: Will the government spending data really change the world?

A big question. Feel free to add your opinion below and any other data projects you have seen today or that pop up in the future.

#iweu: The web data revolution – a new future for journalism?

David McCandless, excited about data

Rounding off Internet Week Europe on Friday afternoon, the Guardian put on a panel discussion in its Scott Room on journalism and data: ‘The web data revolution – a new future for journalism’.

Taking part were Simon Rogers, David McCandless, Heather Brooke, Simon Jeffery and Richard Pope, with Dr Aleks Krotoski moderating.

McCandless, a leading designer and author of data visuals book Information is Beautiful, made three concise, important points about data visualisations:

  • They are relatively easy to process;
  • They can have a high and fast cognitive impact;
  • They often circulate widely online.

Large, unwieldy datasets share none of those traits, they are extremely difficult and slow to process and pretty unlikely to go viral. So, as McCandless’ various graphics showed – from a light-hearted graph charting when couples are most likely to break up to a powerful demonstration of the extent to which the US military budget dwarfs health and aid spending – visualisations are an excellent way to make information accessible and understandable. Not a new way, as the Guardian’s data blog editor Simon Rogers demonstrated with a graphically-assisted report by Florence Nightingale, but one that is proving more and more popular as a means to tell a story.

David McCandless: Peak break-up times, according to Facebook status updates

But, as one audience member pointed out, large datasets are vulnerable to very selective interpretation. As McCandless’ own analysis showed, there are several different ways to measure and compare the world’s armies, with dramatically different results. So, Aleks Krotoski asked the panel, how can we guard against confusion, or our own prejudices interfering, or, worse, wilful misrepresentation of the facts?

McCandless’ solution is three-pronged: firstly, he publishes drafts and works-in-progress; secondly, he keeps himself accountable by test-driving his latest visualisations on a 25-strong group he created from his strongest online critics; third, and most important, he publishes all the raw data behind his work using Google docs.

Access to raw data was the driving force behind Heather Brooke’s first foray into FOI requests and data, she told the Scott Room audience. Distressed at the time it took her local police force to respond to 999 calls, she began examining the stats in order to build up a better picture of response times. She said the discrepancy between the facts and the police claims emphasised the importance of access to government data.

Prior to the Afghanistan and Iraq war logs release that catapulted WikiLeaks into the headlines – and undoubtedly saw the Guardian data team come on in leaps and bounds – founder Julian Assange called for the publishing of all raw data alongside stories to be standard journalistic practice.

You can’t publish a paper on physics without the full experimental data and results, that should be the standard in journalism. You can’t do it in newspapers because there isn’t enough space, but now with the internet there is.

As Simon Rogers pointed out, the journalistic process can no longer afford to be about simply “chucking it out there” to “a grateful public”. There will inevitably be people out there able to bring greater expertise to bear on a particular dataset than you.

But, opening up access to vast swathes of data is one thing, and knowing how to interpret that data is another. In all likelihood, simple, accessible interfaces for organising and analysing data will become more and more commonplace. For the release of the 400,000-document Iraq war logs, OWNI.fr worked with the Bureau of Investigative Journalism to create a program to help people analyse the extraordinary amount of data available.

Simply knowing where to look and what to trust is perhaps the first problem for amateurs. Looking forward, Brooke suggested aggregating some data about data. For example, a resource that could tell people where to look for certain information, what data is relevant and up to date, how to interpret the numbers properly.

So does data – ‘the new oil’ – signal a “revolution” or a “new future” for journalism? I am inclined to agree with Brooke’s remark that data will become simply another tool in the journalists armoury, rather than reshape things entirely. As she said, nobody is talking about ‘telephone-assisted reporting’, completely new once upon a time, it’s just called reporting. Soon enough, the ‘computer-assisted reporting’ course she teaches now at City University will just be ‘reporting’ too.

See also:

Guardian information architect Martin Belam has a post up about the event on his blog, currybetdotnet

Digital journalist Sarah Booker liveblogged presentations by Heather Brooke, David McCandless and Simon Rogers.

RBI to host hacks/hackers day in November

Reed Business Information (RBI) is hosting an event for journalists and programmers interested in working together on data visualisation. The one-day “hack day”, which will take place on 29 November, will be run with the help of data scraping project ScraperWiki.

Speaking on the ScraperWiki blog, Karl Schneider, editorial development director at RBI, explains the thinking behind the event:

Data journalism is an important area of development for our editorial teams in RBI

It’s a hot topic for all journalists, but it’s particularly relevant in the B2B sector. B2B journalism is focused on delivering information that it’s audience can act on, supporting important business decisions.

Often a well-thought-out visualisation of data can be the most effective way of delivering critical information and helping users to understand key trends.

We’re already having some successes with this kind of journalism, and we think we can do a lot more. So building up the skills of our editorial teams in this area is very important.

You can register for the event at this link.

Making data work for you: one week till media140’s dataconomy event

There’s just one week to go before media140’s event on data and how journalists and media can make better use of it. Featuring the Guardian’s news editor for data Simon Rogers and Information is Beautiful author David McCandless, the event will discuss the commercial, ethical and technological issues of making data work for you.

Rufus Pollock, director of the Open Knowledge Foundation, and Andrew Lyons, commercial director of UltraKnowledge will also be speaking. Full details are available at this link.

Journalism.co.uk is proud to be a media partner for media140 dataconomy. Readers of Journalism.co.uk can sign-up for tickets to the event at this link using the promotional code “journalist”. Tickets are currently available for £25, which includes drinks.

The event on Thursday 21 October will be held at the HUB, King’s Cross, from 6:30-9:30pm.

Can hacks and hackers work together? A new ‘living experiment’ looks to find out

Can hacks and hackers work together in the new online news world? This is the question posed by Open Journalism And The Open Web, a free online course being run by the online educational community site p2pu.org in conjunction with Hacks/Hackers, the Mozilla Foundation, the Medill School Of Journalism and the Media Consortium.

The course’s aim is to bring developers, journalists and those relatively uncommon people with a foot in both camps together to answer that question.

As I posted here back in May, I was involved in the early Ruby In The Pub meetings, which have now evolved into the UK arm of Hacks/Hackers. The last meeting attracted over 50 people with talks from a representative of Google as well as hacks and hackers from The Times. It’s a testament to the power of collaboration and the seeking spirit of those that find themselves in this digital space. So when I discovered this experimental course I jumped at the chance to apply, and to my delight was accepted along with forty other people.

Like many such initiatives the course is being run freestyle, with input from attendees welcomed and collaboration positively encouraged. There’s even homework. The course is now in it’s third week and so far the lectures have been excellent – lecture 2 included a talk from Burt Herman, co-creator of Hacks/Hackers and the man behind storify.com. We’ve also had a lecture from Rob Purdie, agile development experty of the Economist and subjects and questions that have come up so far have involved the nature of collaboration, how to break down technical projects into smaller components and story analysis. The discourse has been vibrant and engaging and I’m sure interesting projects will emerge.

More importantly, this is a living experiment, an embodiment of the questions posed by Hacks/Hackers and their ilk in a more structured format. When the six-week time capsule comes to an end, I’m sure I will have learned a lot about journalism and journalists, the problems they face and their perception of data and information systems. I hope they will feel the same about developers.

Interestingly, the first barrier we came up against was, not surprisingly, language. This hit home with the more technical assignments and discussions, where a lot of us hackers went straight into jargon mode. We require a compressed and succint language as our job is fast-paced and we need to communicate quickly. It serves as shorthand. But, like developers who spend a lot of time talking to the non-technical side of their business, we soon realised that we had some hacks amongst us too and needed to dilute the language a little in order to bridge the gap and freely explore our common interests and problems.

So far that commonality – engagement and curiousity, the desire to stay one step ahead in fast-changing digital arena, a passion for information – seem to be outweighing the differences. Three weeks to go. I’ll try and drop a post once a week with an update on what’s happening and hopefully will be able to interview the P2PU guys at the end. It’s an exciting time to be a hack and a hacker.

Nick Davies: Data, crowdsourcing and the ‘immeasurable confusion’ around Julian Assange

Investigative journalist Nick Davies chipped in with his thoughts on crowdsourcing data analysis by news organisations at this week’s Frontline Club event. (You can listen to a podcast featuring the panellists at this link)

For Davies, who brokered the Guardian’s involvement in the WikiLeaks Afghanistan War Logs, such stories suggest that asking readers to trawl through data for stories doesn’t work:

I haven’t seen any significant analysis of that raw material (…) There were all sorts of angles that we never got to because there was such much of it. For example, there was a category of material that was recorded by the US military as being likely to create negative publicity. You would think somebody would search all those entries and put them together and compare them with what actually was put out in press releases.

I haven’t seen anyone do anything about the treatment of detainees, which is recorded in there.

We got six or seven good thematic stories out of it. I would think there are dozens of others there. There’s some kind of flaw in the theory that crowdsourcing is a realistic way of converting data into information and stories, because it doesn’t seem to be happening.

And Davies had the following to say about WikiLeaks head Julian Assange:

We warned him that he must not put this material unredacted onto the WikiLeaks website because it was highly likely to get people killed. And he never really got his head around that. But at the last moment he did a kind of word search through these 92,00 documents looking for words like source or human intelligence and withdrew 15,000 docs that had those kind of words in. it’s a very inefficient way of making those documents safe and I’m worried about what’s been put up on there.

He then kind of presented the withholding these 15,000 documents as some kind of super-secret, but it’s already been released (…) The amount of confusion around Julian is just immeasurable. In general terms you could say he’s got other kinds of material coming through WikiLeaks and there’s all sorts of possibilities about who might be get involved in processing it. Personally I feel much happier pursuing the phone hacking, which is a relatively clean story that Julian’s not involved in.

Why the US and UK are leading the way on semantic web

Following his involvement in the first Datajournalism meetup in Berlin earlier this week, Martin Belam, the Guardian’s information architect, looks at why the US and UK may have taken the lead in semantic web, as one audience member suggested on the day.

In an attempt to try and answer the question, he puts forward four themes on his currybet.net blog that he feels may play a part. In summary, they are:

  • The sharing of a common language which helps both nations access the same resources and be included in comparative datasets.
  • Competition across both sides of the pond driving innovation.
  • Successful business models already being used by the BBC and even more valuably being explained on their internet blogs.
  • Open data and a history of freedom of information court cases which makes official information more likely to be made available.

On his full post here he also has tips for how to follow the UK’s lead, such as getting involved in hacks and hackers type events.

Got trouble swallowing? That’s not a problem with Kamagra Oral Jelly: read article at lowlibido

#ddj: Follow the Data Driven Journalism conference

Today in Amsterdam the great and good of data journalism are gathering to discuss the tools, techniques and opportunities for journalists using and visualising data in stories.

Full details are on the event site, which explains:

Developing the know-how to use the available data more effectively, to understand it, communicate and generate stories based on it, could be a huge opportunity to breathe new life into journalism. Journalists can find new roles as “sense-makers” digging deep into data, thus making reporting more socially relevant. If done well, delivering credible information and advice could even generate revenues, opening up new perspectives on business models, aside from subscriptions and advertising.

OWNI.fr‘s Nicolas Kayser-Bril will be blogging about the day for Journalism.co.uk. To keep up with what’s being said, you can follow the Twitter hashtag #ddj below.