Tag Archives: API

Poynter: How journalists can get to grips with API

Poynter has a very helpful beginner’s guide for journalists who want to understand API documentation.

It helps journalists understand the terms used by sites with an open API (application programming interface) and follows an earlier article on four reasons your news org should use APIs.

One really useful part of this post is that it allows you to hand-build an API request by taking you step-by-step through an example using the New York Times API (you will have to register with the NY Times to request an API key).

For example, let’s try getting New York Times reviews for the “Harry Potter” movies as an XML-formatted response. Use your favourite search engine to find the New York Times movie reviews API. This API is not perfect (it’s in beta, after all). The steps below can be compressed with shortcuts once you become more experienced, but since we’re assuming this is your first time, we’re going to take the slow road.

Click here for the rest of Poynter’s guide to follow the example.

 

 

 

 

Twitter’s Local Trends

Last week Twitter announced it was rolling out its new feature, Local Trends: a means of tracking topics trending in your local ‘state or city’.

The big events that come up around the world will always become a global conversation, but what about the big events that only happen in your world that only matter to those around you? Or the slight differences in the way Californians perceive an event, like Obama’s election victory, versus those São Paulo, Brazil?

Local Trends will allow you to learn more about the nuances in our world and discover even more relevant topics that might matter to you. We’ll be improving this feature over time to provide more locations, languages, and data through our API.

Locations added so far:

Countries: Brazil; Canada; Ireland; Mexico; United Kingdom; United States

Cities: Atlanta; Baltimore; Boston; Chicago; Dallas-Ft. Worth; Houston; London; Los Angeles; New York City; Philadelphia; San Antonio; San Francisco; Seattle; São Paulo; Washington, D.C.

As yet, non-London residents only see nationally trending topics in the UK, but Twitter says it is working to add more locations.

#DataJourn: Royal Mail cracks down on unofficial postcode database

A campaign to release UK postcode data that is currently the commercial preserve of the Royal Mail (prices at this link) has been gathering pace for a while. And not so long ago in July, someone uploaded a set to Wikileaks.

How useful was this, some wondered: the Guardian’s Charles Arthur, for example.

In an era of grassroots, crowd-sourced accountability journalism, this could be a powerful tool for journalists and online developers when creating geo-data based applications and investigations.

But the unofficial release made this a little hard to assess. After all, the data goes out of date very fast, so unless someone kept leaking it, it wouldn’t be all that helpful. Furthermore it would be in defiance of the Royal Mail’s copyright, so would be legally risky to use.

At the forefront of the ‘Free Our Postcodes’ campaign is Earnest Marples, the site named after the British postmaster general who introduced the postcode. Marples is otherwise known as Harry Metcalfe and Richard Pope, who – without disclosing their source – opened an API which could power sites such as PlanningAlerts.com and Jobcentre Pro Plus.

“We’re doing the same as everyone’s being doing for years, but just being open about it,” they said at the time of launch earlier this year.

But now they have closed the service. Last week they received cease and desist letters from the Royal Mail demanding that they stop publishing information from the database (see letters on their blog).

“We are not in a position to mount an effective legal challenge against the Royal Mail’s demands and therefore have closed the ErnestMarples.com API, effective immediately,” Harry Metcalfe told Journalism.co.uk.

“We’re very disappointed that Royal Mail have chosen to take this course. The service was supporting numerous socially useful applications such as Healthwhere, JobcentreProPlus.com and PlanningAlerts.com. We very much hope that the Royal Mail will work with us to find a solution that allows us to continue to operate.”

A Royal Mail spokesman said: “We have not asked anyone to close down a website. We have simply asked a third party to stop allowing unauthorised access to Royal Mail data, in contravention of our intellectual property rights.”

Business Insider: Chart of the Day – 24% of US newspapers don’t use digital delivery platforms

Courtesy of Silicon Alley Insider’s ‘Business Insider’, a chart showing that 24 per cent of US newspapers do not use any digital delivery platforms to spread their online content.

“The American Press Institute asked 2,400 newspaper executives if their papers ‘provide access to stories or information such as sports scores, headlines, stock quotes, etc.,’ via Twitter, Facebook, Email alerts, Mobile/PDA, YouTube, Kindle, Flickr, e-readers, etc., and told them to ‘check all that apply.'”

24 per cent of all respondents answered ‘None at this time’.

Business Insider post at this link…

Linking data and journalism: what’s the future?

On Wednesday (September 9), Paul Bradshaw, course director of the MA Online Journalism at Birmingham City University and founder of HelpMeInvestigate.com, chaired a discussion on data and the future of journalism at the first London Linked Data Meetup. This post originally appeared on the OnlineJournalismBlog.

The panel included: Martin Belam (information architect, the Guardian; blogger, Currybet; John O’Donovan (chief architect, BBC News Online); Dan Brickley (Friend of a Friend project; VU University, Amsterdam; SpyPixel Ltd; ex-W3C); Leigh Dodds (Talis).

“Linked Data is about using the web to connect related data that wasn’t previously linked, or using the web to lower the barriers to linking data currently linked using other methods.” (http://linkeddata.org)

I talked about how 2009 was, for me, a key year in data and journalism – largely because it has been a year of crisis in both publishing and government. The seminal point in all of this has been the MPs’ expenses story, which both demonstrated the power of data in journalism, and the need for transparency from government. For example: the government appointment of Sir Tim Berners-Lee, the search for developers to suggest things to do with public data, and the imminent launch of Data.gov.uk around the same issue.

Even before then the New York Times and Guardian both launched APIs at the beginning of the year, MSN Local and the BBC have both been working with Wikipedia and we’ve seen the launch of a number of startups and mashups around data including Timetric, Verifiable, BeVocal, OpenlyLocal, MashTheState, the open source release of Everyblock, and Mapumental.

Q: What are the implications of paywalls for Linked Data?
The general view was that Linked Data – specifically standards like RDF [Resource Description Format] – would allow users and organisations to access information about content even if they couldn’t access the content itself. To give a concrete example, rather than linking to a ‘wall’ that simply requires payment, it would be clearer what the content beyond that wall related to (e.g. key people, organisations, author, etc.)

Leigh Dodds felt that using standards like RDF would allow organisations to more effectively package content in commercially attractive ways, e.g. ‘everything about this organisation’.

Q: What can bloggers do to tap into the potential of Linked Data?
This drew some blank responses, but Leigh Dodds was most forthright, arguing that the onus lay with developers to do things that would make it easier for bloggers to, for example, visualise data. He also pointed out that currently if someone does something with data it is not possible to track that back to the source and that better tools would allow, effectively, an equivalent of pingback for data included in charts (e.g. the person who created the data would know that it had been used, as could others).

Q: Given that the problem for publishing lies in advertising rather than content, how can Linked Data help solve that?
Dan Brickley suggested that OAuth technologies (where you use a single login identity for multiple sites that contains information about your social connections, rather than creating a new ‘identity’ for each) would allow users to specify more specifically how they experience content, for instance: ‘I only want to see article comments by users who are also my Facebook and Twitter friends.’

The same technology would allow for more personalised, and therefore more lucrative, advertising. John O’Donovan felt the same could be said about content itself – more accurate data about content would allow for more specific selling of advertising.

Martin Belam quoted James Cridland on radio: ‘[The different operators] agree on technology but compete on content’. The same was true of advertising but the advertising and news industries needed to be more active in defining common standards.

Leigh Dodds pointed out that semantic data was already being used by companies serving advertising.

Other notes
I asked members of the audience who they felt were the heroes and villains of Linked Data in the news industry. The Guardian and BBC came out well – The Daily Mail were named as repeat offenders who would simply refer to ‘a study’ and not say which, nor link to it.

Martin Belam pointed out that the Guardian is increasingly asking itself ‘how will that look through an API?’ when producing content, representing a key shift in editorial thinking. If users of the platform are swallowing up significant bandwidth or driving significant traffic then that would probably warrant talking to them about more formal relationships (either customer-provider or partners).

A number of references were made to the problem of provenance – being able to identify where a statement came from. Dan Brickley specifically spoke of the problem with identifying the source of Twitter retweets.

Dan also felt that the problem of journalists not linking would be solved by technology. In conversation previously, he also talked of ‘subject-based linking’ and the impact of SKOS [Simple Knowledge Organisation System] and linked data style identifiers. He saw a problem in that, while new articles might link to older reports on the same issue, older reports were not updated with links to the new updates. Tagging individual articles was problematic in that you then had the equivalent of an overflowing inbox.

Finally, here’s a bit of video from the very last question addressed in the discussion (filmed with thanks by @countculture):

Linked Data London 090909 from Paul Bradshaw on Vimeo.

Resources:

Window on the Media: ‘I smell a government rat in my news’

Nicolas Kayser-Bril raises concerns that government or industry sponsored news outlets and stories will gain increasing coverage, as media organisations face swingeing cutbacks and foreign bureaux are closed.

He’s so concerned, in fact, that he’s built an app based on Google News API. Use it to search for a topic and it’ll suggest the share of articles (from a selection of 60) paid for in this way.

Full post at this link…

Nieman Journalism Lab: Recommendations from the API for the Chicago meeting

Nieman Journalism Lab links to a copy of the American Press Institute (API) report prepared for the ‘paid content’ Chicago meeting for newspaper executives last week.

“Top newspaper execs (…) heard from several entrepreneurs who are proposing new ways for papers to generate revenue online,” NJL reported.

Full story at this link…

Poyntor’s Rick Edmonds comments on it here, at this link.

ReadWriteWeb: CNET signs up for Open Calais

CNET.com will now share data from its technology reviews, news and blog posts on using Thomson Reuters’ Open Calais platform, allowing other publishers to use the information.

According to this report, CNET will publish certain sets of editorial data and some commercial information, for example data on its software download services, using the semantic API.

Signing up to OpenCalais will also enable CNET to generate topic pages.

Full story at this link…

#DataJourn part 2: Q&A with ‘data juggler’ Tony Hirst

As explained in part one of today’s #datajourn conversation, Tony Hirst is the ‘data juggler’ (as titled by Guardian tech editor Charles Arthur) behind some of the most interesting uses of the Guardian’s Open Platform (unless swear words are your thing – in which case check out Tom Hume’s work)

Journalism.co.uk sent OU academic, mashup artist and Isle of Wight resident, Tony Hirst, some questions over. Here are his very comprehensive answers.

What’s your primary interest in – and motivation for – playing with the Guardian’s Open Platform?
TH: Open Platform is a combination of two things – the Guardian API, and the Guardian Data store. My interest in the API is twofold: first, at the technical level, does it play nicely with ‘mashup tools’ such as yahoo pipes, Google spreadsheet’s =importXML formula, and so on; secondly, what sort of content does it expose that might support a ‘news and learning’ mashup site where we can automatically pull in related open educational resources around a news story to help people learn more about the issues involved with that story?

One of the things I’ve been idling about lately is what a ‘university API’ might look at, so the architecture of the Guardian API, and in particular the way the URIs that call on the API, are structured is of interest in that regard (along with other APIs, such as the New York Times’ APIs, the BBC programmes’ API, and so on).

The data blog resources – which are currently being posted on Google spreadsheets – are a handy source of data in a convenient form that I can use to try out various ‘mashup recipes’. I’m not so interested in the data as is, more in the ways in which it can be combined with other data sets (for example, in Dabble DB) and or displayed using third party visualisation tools. What inspires me is trying to find ‘mashup patterns’ that other people can use with other data sets. I’ve written several blog posts showing how to pull data from Google spreadsheets in IBM’s Many Eyes Wikified visualisation tool: it’d be great if other people realised they could use a similar approach to visualise sets of data I haven’t looked at.

Playing with the actual data also turns up practical ‘issues’ about how easy it is to create mashups with public data. For example, one silly niggle I had with the MPs’ expenses data was that pound signs appeared in many of the data cells, which meant that Many Eyes Wikified, for example, couldn’t read the amounts as numbers, and so couldn’t chart them. (In fact, I don’t think it likes pound signs at all because of the character encoding!) Which meant I had to clean the data, which introduced another step in the chain where errors could be introduced, and which also raised the barrier to entry for people wanting to use the data directly from the data store spreadsheet. If I can help find some of the obstacles to effective data reuse, then maybe I can help people publish their data in way that makes it easier for other people to reuse (including myself!).

Do you feel content with the way journalists present data in news stories, or could we learn from developers and designers?
TH: There’s a problem here in that journalists have to present stories that are: a) subject to space and layout considerations beyond their control; and b) suited to their audience. Just publishing tabulated data is good in the sense that it provides the reader with evidence for claims made in a story (as well as potentially allowing other people to interrogate the data and maybe look for other interpretations of it), but I suspect is meaningless, or at least of no real interest, to most people. For large data sets, you wouldn’t want to publish them within a story anyway.

An important thing to remember about data is that it can be used to tell stories, and that it may hide a great many patterns. Some of these patterns are self-evident if the data is visualised appropriately. ‘Geo-data’ is a fine example of this. It’s natural home is on a map (as long as the geo-coding works properly, that is (i.e. the mapping from location names, for example, to latitude/longitude co-ordinates than can be plotted on a map).

Finding ways of visualising and interacting data is getting easier all the time. I try to find mashup patterns that don’t require much, if any, writing of computer programme code, and so in theory should be accessible to many non-developers. But it’s a confidence thing: and at the moment, I suspect that it is the developers who are more likely to feel confident taking data from one source, putting it into an application, and then providing the user with a simple user interface that they can ‘just use’.

You mentioned about ‘lowering barriers to entry’ – what do you mean by that, and how is it useful?

TH: Do you write SQL code to query databases? Do you write PHP code parse RSS feeds and filter out items of interest? Are you happy writing Javascript to parse a JSON feed, or would rather use XMLHTTPRequest and a server side proxy to pull in an XML feed into a web page and get around the domain security model?

Probably none of the above.

On the other hand, could you copy and paste a URL to a data set into a ‘fetch’ block in a Yahoo pipe, identify which data element related to a place name so that you could geocode the data, and then take the URL of the data coming out from the pipe and paste it into the Google maps search box to get a map based view of your data? Possibly…

Or how about taking a spreadsheet URL, pasting it into Many Eyes Wikified, choosing the chart type you wanted based on icons depicting those chart types, and then selecting the data elements you wanted to plot on each axis from a drop down menu? Probably…

What kind of recognition/reward would you like for helping a journalist produce a news story?
TH: A mention for my employer, The Open University, and a link to my personal blog, OUseful.info. If I’d written a ‘How To’ explanation describing how a mashup or visualisation was put together, a link to that would be nice too. And if I ever met the journalist concerned, a coffee would be appreciated! I also find it valuable knowing what sorts of things journalists would like to be able to do with the technology that they can’t work out how to do. This can feed into our course development process, identifying the skills requirements that are out there, and then potentially servicing those needs through our course provision. There’s also the potential for us to offer consultancy services to journalists too, producing tools and visualisations as part of a commercial agreement.

One of the things my department is looking at at the moment is a revamped website. it’s a possibility that I’ll start posting stories there about any news related mashups I put together, and if that is the case, then links to that content would be appropriate. This isn’t too unlike the relationship we have with the BBC, where we co-produce televlsion and radio programmes and get links back to supporting content on OU websites from BBC website, as well as programme credits. For example, I help pull together the website around the BBC World Service programme Digital Planet, which we co-produce every so often. which gets a link from the World Service website (as well as the programme’s Facebook group!), and the OU gets a mention in the closing credits. The rationale behind this approach is getting traffic to OU sites, of course, where we can then start to try to persuade people to sign up for related courses!

Beet.TV: Why APIs are essential – CurrentTV’s Robin Sloan

Good explanation of APIs and how they can be used by third-party developers and as the foundations for media partnerships.

Trust your users and realise that they’re smarter than you think, adds CurrentTV’s Robin Sloan.

Full story at this link…