Tag Archives: Charles Arthur

#DataJourn part 3: Useful and recent links looking at use of data in journalism

Perhaps we’ll expand this to a Dipity timeline at some point (other ideas?), but for the meantime, here’s a list of a few recent and relevant links relating to CAR and use of data in journalism to get the conversation on Twitter – via #datajourn – going. NB: These are not necessarily in chronological order. Then, the next logical step would be to start looking at examples of where data has been used for specific journalism projects.

#DataJourn part 2: Q&A with ‘data juggler’ Tony Hirst

As explained in part one of today’s #datajourn conversation, Tony Hirst is the ‘data juggler’ (as titled by Guardian tech editor Charles Arthur) behind some of the most interesting uses of the Guardian’s Open Platform (unless swear words are your thing – in which case check out Tom Hume’s work)

Journalism.co.uk sent OU academic, mashup artist and Isle of Wight resident, Tony Hirst, some questions over. Here are his very comprehensive answers.

What’s your primary interest in – and motivation for – playing with the Guardian’s Open Platform?
TH: Open Platform is a combination of two things – the Guardian API, and the Guardian Data store. My interest in the API is twofold: first, at the technical level, does it play nicely with ‘mashup tools’ such as yahoo pipes, Google spreadsheet’s =importXML formula, and so on; secondly, what sort of content does it expose that might support a ‘news and learning’ mashup site where we can automatically pull in related open educational resources around a news story to help people learn more about the issues involved with that story?

One of the things I’ve been idling about lately is what a ‘university API’ might look at, so the architecture of the Guardian API, and in particular the way the URIs that call on the API, are structured is of interest in that regard (along with other APIs, such as the New York Times’ APIs, the BBC programmes’ API, and so on).

The data blog resources – which are currently being posted on Google spreadsheets – are a handy source of data in a convenient form that I can use to try out various ‘mashup recipes’. I’m not so interested in the data as is, more in the ways in which it can be combined with other data sets (for example, in Dabble DB) and or displayed using third party visualisation tools. What inspires me is trying to find ‘mashup patterns’ that other people can use with other data sets. I’ve written several blog posts showing how to pull data from Google spreadsheets in IBM’s Many Eyes Wikified visualisation tool: it’d be great if other people realised they could use a similar approach to visualise sets of data I haven’t looked at.

Playing with the actual data also turns up practical ‘issues’ about how easy it is to create mashups with public data. For example, one silly niggle I had with the MPs’ expenses data was that pound signs appeared in many of the data cells, which meant that Many Eyes Wikified, for example, couldn’t read the amounts as numbers, and so couldn’t chart them. (In fact, I don’t think it likes pound signs at all because of the character encoding!) Which meant I had to clean the data, which introduced another step in the chain where errors could be introduced, and which also raised the barrier to entry for people wanting to use the data directly from the data store spreadsheet. If I can help find some of the obstacles to effective data reuse, then maybe I can help people publish their data in way that makes it easier for other people to reuse (including myself!).

Do you feel content with the way journalists present data in news stories, or could we learn from developers and designers?
TH: There’s a problem here in that journalists have to present stories that are: a) subject to space and layout considerations beyond their control; and b) suited to their audience. Just publishing tabulated data is good in the sense that it provides the reader with evidence for claims made in a story (as well as potentially allowing other people to interrogate the data and maybe look for other interpretations of it), but I suspect is meaningless, or at least of no real interest, to most people. For large data sets, you wouldn’t want to publish them within a story anyway.

An important thing to remember about data is that it can be used to tell stories, and that it may hide a great many patterns. Some of these patterns are self-evident if the data is visualised appropriately. ‘Geo-data’ is a fine example of this. It’s natural home is on a map (as long as the geo-coding works properly, that is (i.e. the mapping from location names, for example, to latitude/longitude co-ordinates than can be plotted on a map).

Finding ways of visualising and interacting data is getting easier all the time. I try to find mashup patterns that don’t require much, if any, writing of computer programme code, and so in theory should be accessible to many non-developers. But it’s a confidence thing: and at the moment, I suspect that it is the developers who are more likely to feel confident taking data from one source, putting it into an application, and then providing the user with a simple user interface that they can ‘just use’.

You mentioned about ‘lowering barriers to entry’ – what do you mean by that, and how is it useful?

TH: Do you write SQL code to query databases? Do you write PHP code parse RSS feeds and filter out items of interest? Are you happy writing Javascript to parse a JSON feed, or would rather use XMLHTTPRequest and a server side proxy to pull in an XML feed into a web page and get around the domain security model?

Probably none of the above.

On the other hand, could you copy and paste a URL to a data set into a ‘fetch’ block in a Yahoo pipe, identify which data element related to a place name so that you could geocode the data, and then take the URL of the data coming out from the pipe and paste it into the Google maps search box to get a map based view of your data? Possibly…

Or how about taking a spreadsheet URL, pasting it into Many Eyes Wikified, choosing the chart type you wanted based on icons depicting those chart types, and then selecting the data elements you wanted to plot on each axis from a drop down menu? Probably…

What kind of recognition/reward would you like for helping a journalist produce a news story?
TH: A mention for my employer, The Open University, and a link to my personal blog, OUseful.info. If I’d written a ‘How To’ explanation describing how a mashup or visualisation was put together, a link to that would be nice too. And if I ever met the journalist concerned, a coffee would be appreciated! I also find it valuable knowing what sorts of things journalists would like to be able to do with the technology that they can’t work out how to do. This can feed into our course development process, identifying the skills requirements that are out there, and then potentially servicing those needs through our course provision. There’s also the potential for us to offer consultancy services to journalists too, producing tools and visualisations as part of a commercial agreement.

One of the things my department is looking at at the moment is a revamped website. it’s a possibility that I’ll start posting stories there about any news related mashups I put together, and if that is the case, then links to that content would be appropriate. This isn’t too unlike the relationship we have with the BBC, where we co-produce televlsion and radio programmes and get links back to supporting content on OU websites from BBC website, as well as programme credits. For example, I help pull together the website around the BBC World Service programme Digital Planet, which we co-produce every so often. which gets a link from the World Service website (as well as the programme’s Facebook group!), and the OU gets a mention in the closing credits. The rationale behind this approach is getting traffic to OU sites, of course, where we can then start to try to persuade people to sign up for related courses!

#DataJourn part 1: a new conversation (please re-tweet)

Had it not been published at the end of the workday on a Friday, Journalism.co.uk would have made a bit more of a song-and-dance of this story, but as a result it instead it got reduced to a quick blog post. In short: OU academic Tony Hirst produced a rather lovely map, on the suggestion (taunt?) of the Guardian’s technology editor, Charles Arthur, and the result? A brand new politics story for the Guardian on MPs’ expenses.

Computer-assisted reporting (CAR) is nothing new, but innovations such as the Guardian’s launch of Open Platform, are leading to new relationships and conversations between data/stats experts, programmers and developers, (including the rarer breed of information architects), designers, and journalists – bringing with them new opportunities, but also new questions. Some that immediately spring to mind:

  • How do both parties (data and interactive gurus and the journalists) benefit?
  • Who should get credit for new news stories produced, and how should developers be rewarded?
  • Will newsrooms invest in training journalists to understand and present data better?
  • What problems are presented by non-journalists playing with data, if any?
  • What other questions should we be asking?

The hashtag #datajourn seems a good one with which to kickstart this discussion on Twitter (Using #CAR, for example, could lead to confusion…).

So, to get us started, two offerings coming your way in #datajourn part 2 and 3.

Please add your thoughts below the posts, and get in touch with judith@journalism.co.uk (@jtownend on Twitter) with your own ideas and suggestions for ways Journalism.co.uk can report, participate in, and debate the use of CAR and data tools for good quality and ethical journalism.

MPs’ travel expenses disparity highlighted by Guardian Open Platform projects

Tony Hirst, the independent developer who launched some of the first projects using the Guardian’s Open Platform, has again used the Data Store in an innovative way – leading to a new story about MPs’ expenses for the Guardian.

Hirst’s use of Google Maps shows that there are differences of up to £20,000 in neighbouring MPs’ travel expenses.

Hirst describes his work here which he developed after he discovered that the expenses data was being released via Data Store. Guardian technology editor Charles Arthur wrote about Hirst’s initial efforts and said “what we need now is a dataset which shows constituency distances from Westminster, so that we can compare that against travel…”

Hirst clearly couldn’t resist the challenge.


Craig McGill: Pitch by Twitter, says Guardian’s Charles Arthur

Craig McGill discusses the Guardian’s technology editor, Charles Arthur, request that PRs pitch only by Twitter, via a public ‘@’ if they are not able to direct message him (you have to be mutually following each other to do that). Arthur has removed his email details from Gorkana in an attempt to reduce the clutter in his inbox.

Full post at this link…

Q&A with an information architect (aka @currybet aka Martin Belam)

Martin Belam, of the CurryBet blog, has recently been appointed as ‘information architect’ for Guardian.co.uk. Journalism.co.uk asked him what he’ll be doing for the site…

For those who don’t know what you do, fill us in your background and the new gig…
[MB] I was at the Hack Day that the Guardian’s technology department ran back in November 2008, and the talent and enthusiasm that day really shone. I’ve really enjoyed the freedom of working as a consultant over the last three years, much of the time based either in Crete or in Austria, but the opportunity of coming to work more permanently for an organisation as forward-thinking as the Guardian is being with initiatives like the Open Platform was too much to resist.

So, an ‘information architect’ what does that mean and what are you doing?
Information Architecture has been defined as ‘the emerging art and science of organising large-scale websites’.

All websites have an inherent information structure – the navigation, the contextual links on a page, whether there are tags describing content and so forth. It is about how people navigate and way-find their way through the information presented on a site.

What I’ll be doing at the Guardian is influencing that structure and functionality as new digital products are developed. It involves working closely with design and editorial teams to produce ‘wireframes’, the blueprints of web design, and also involves being an advocate for the end user – carrying out lots of usability and prototype testing as ideas are developed.

Is it a full-time role?
I’m working four days a week at The Guardian, as I still have some other commitments – for example as contributing editor for FUMSI magazine – although already it feels a bit like cramming a full-time job into just 80 per cent of the time!

It’s not happy times for mainstream media brands: where are they going wrong?
I don’t think it is only mainstream media brands that are suffering from the disruption caused by digital transition, but we do see a lot of focus on this issue for print businesses at the moment. I think one of the things that strikes me, having worked at several big media companies now, including the BBC and Sony, is that you would never set these organisations up in this way in the digital era if you were doing it from scratch.

One of the things that appealed most about joining the Guardian was that the move to Kings Place has brought together the print, online and technical operations in a way that wasn’t physically possible before in the old offices. I’m still very optimistic that there are real opportunities out there for the big media brands that can get their business structures right for the 21st century.

What kind of things do you think could re-enthuse UK readers for their newspapers?
I think our core and loyal readers are still enthusiastic about their papers, but that as an industry we have to face the fact that there is an over-supply of news in the UK, and a lot of it – whether it is on the radio, TV, web or thrust into your hand as a freebie – is effectively free at the point of delivery. I think the future will see media companies who concentrate on playing to their strengths benefit from better serving a narrower target audience.

Do you see print becoming the by rather than primary product for the Guardian – or has that already happened?
I think there might very well be a ‘sweet spot’ in the future where the display quality on network-enabled mobile devices and the ubiquity of data through-the-air means that the newspaper can be delivered primarily in that way, but I don’t see the Guardian’s presses stopping anytime soon. Paper is still a very portable format, and it never loses connection or runs out of batteries.

Your background is in computer programming rather than journalism, will the two increasingly overlap?
I grew up in the generation that had BBC Micros and ZX Spectrums at home, so I used to program a lot as a child, but my degree was actually in History, which in itself is a very journalistic calling. I specialised in the Crusades and the Byzantine Empire, which is all about piecing together evidence from a range of sources of varying degrees of reliability, and synthesizing a coherent narrative and story from there. And, of course, I’ve spent most of this decade blogging, which utilises ‘some’ of the journalist’s skill-set ‘some’ of the time.

Whilst I’d never suggest that journalists need to learn computer programming much beyond a smattering of HTML, I think there is something to be gained from understanding the software engineering mindset. There are a lot of tools and techniques that can really help journalists plough through data to get at the heart of a story, or to use visualisation tools to help tell that story to their audience.

One of the most interesting things about working at the Guardian is the opportunity to work alongside people like Kevin Anderson, Charles Arthur and Simon Willison, who I think really represent that blending of the technical and journalistic cultures.

You’ve spoken out about press regulation before; why do you feel strongly about it?
In a converged media landscape, it seems odd that Robert Peston’s blog is regulated by the BBC Trust, Jon Snow’s blog is regulated by Ofcom, and Roy Greenslade’s blog is regulated by the PCC.

At the moment, I believe that the system works very well for editors, and very well for the ‘great and the good’ who can afford lawyers, but does absolutely nothing for newspaper consumers. If I see something that offends me on TV, I can complain to Ofcom. If I see an advert that offends me in the street, I can complain to ASA. If I see an article in a newspaper that I think is wrong, inaccurate, in bad taste or offensive, unless I am directly involved in the story myself, the PCC dismisses my complaint out of hand without investigating it.

I don’t think that position is sustainable.

The last thing I want to see is some kind of state-sponsored Ofpress quango, which is why I think it is so important that our industry gets self-regulation right – and why I believe that a review of how the PCC works in the digital era is long overdue.

Charles Arthur: New to journalism? Learn to code

“All sorts of fields of journalism – basically, any where you’re going to have to keep on top of a lot of data that will be updated, regularly or not – will benefit from being able to analyse and dig into that data, and present it in interesting ways,” says the Guardian’s technology editor.

Full story at this link…

Twitter-quette: how do you want J.co.uk to cover events?

There’s been quite a lot of discussion about how to behave on Twitter lately. Last week @charlesarthur said it was all about the links and got a few conflicting comments below his blog post about how to be interesting (or not) on Twitter.

Earlier in the week, one of @journalismnews followers said they didn’t like too many Tweets from an event, without prior warning.

So, over to you our lovely followers … Do you think we should have a specific events Twitter name for all events, or specific ones for each event we attend, which we’ll publicise the name for from  @journalismnews?

Tweet back, or drop us a comment below.

Grauniad.co.uk v Torygraph.co.uk: Round 374

We’ve been following the various Telegraph/Guardian online interactions this week:

Yesterday, Roy Greenslade published an anonymous email from a Telegraph hack, who wrote that he/she was more than a little bit fed up.  The gist of the email was that all this multimedia-ised hub-it-up lark is to the detriment of a good, healthy working life and quality journalism.

Greenslade cautiously said he was printing the letter but that he didn’t necessarily agree with its sentiment.

Over at CounterValues, Telegraph assistant editor Justin Williams was quick to pooh pooh it. And now Greenslade has put up his response to the letter – a more negative stance this time: ‘the past is another country, think positive,’ he tells his ’emailing friend’.

Meanwhile, in another post, Williams took a swipe at the Guardian’s system of buying sponsored links and keywords. He reckons their buying is well in excess of the Telegraph’s and the Times’.

In the comments below the post, Charles Arthur, the Guardian’s technology editor, asks how many subsidised paper subscriptions the Telegraph has: ‘Is [buying sponsored links and keywords] a worse or better investment than subsidising paper subscriptions, do you think?’, he writes.

Charles Arthur is a keen Twitterer and I’ve just located Justin Williams on Twitter; all that Tweeting in agreement can be a bit boring: how about getting the discussion going in Twitterland? It’s a shame this didn’t get going earlier, with it being (unofficial?) ‘speak like a pirate day’ – that would make it fun. A good customer service team is essential for a growing business. The importance of satisfied customers is enormous. If you want to establish a serious brand, definitely don’t skimp on your customer service.

Can’t wait for next week’s ABCes…

Online Journalism Scandinavia: Berlingske Tidende – using crime maps for journalism

As the UK government announces plans for crime maps for offences in England and Wales, Kristine Lowe reports for Journalism.co.uk on how Danish paper Berlingske Tidende is using its own map as a source of news and a public service.

“Crime mapping is getting government push behind it, even if police are resisting,” wrote the Guardian’s technology editor Charles Arthur this week, as the government announced plans to publish local interactive crime maps for every area in England and Wales by Christmas.

In Denmark the national daily Berlingske Tidende is already pioneering the use of crime maps as part of the newsgathering process.

With the help of its readers, the paper has created an interactive crime map detailing how well the police responds to calls from the public.

“We have just had a major police reform here in Denmark and decided to investigate how this has worked. The politicians promised more police on the streets and more money to solve crime. We thought the best way to check the reality of these promises was to get our readers to tell us about their experiences,” Christian Jensen, editor-in-chief of Berlingske, told Journalism.co.uk.

The reader reports are placed on a Google map of the country and, since its launch in May, 70 crimes have been reported and plotted.

One of the crimes reported to the map related to the alleged murder of Danish woman Pia Rönnei.

Despite available patrols in the area, the police force did not send officers to investigate calls from neighbours, who reported screams and loud bangs from an apartment that Rönnei was in – something it has been forced to apologise for after the publicity the story received.

“In classic journalism, it is the journalists who find the stories. In our new media reality, it can just as well be the readers who alert us to issues they are concerned about,” said Jensen.

The newspaper has had two full-time reporters devoted to the project, and used an online journalist, photographer and production company (for live pictures) in stories they have devoted additional space to.

“We encourage people to get in touch with stories both in our paper edition and online, as we see a substantial increase in web traffic when we draw attention to the project in the paper edition,” Jensen explained.

Every single crime report on the map generates the same amount of web traffic as breaking news, he added.

The project has been so successful that the newspaper is preparing to launch another project in the same vein. In the next few days Berlingske will unveil a database on immigration politics, where readers can tell their own stories and read and comment on each others’ accounts of their experiences with immigration authorities.

But the biggest challenge for the paper has been verification:

“That is what makes this complicated. Our journalists read through all the reports to check their credibility, but we do not have the resources to verify every single detail. That has made it even more important to clarify from the outset that we are only reporting what the readers have told us.”