How media sites can make use of linked data

Martin Belam, information architect for the Guardian and CurryBet blogger, reports from today’s Linked Data meet-up in London, for Journalism.co.uk.

The morning Linked Data meet-up session at ULU was part of a wider dev8d event for developers, described as ‘four days of 100 per cent pure software developer heaven’. That made it a little bit intimidating for the less technical in the audience – the notices on the rooms to show which workshops were going on were labelled with 3D barcodes, there were talks about programming ‘nanoprojectors’, and a frightening number of abbreviations like RDF, API, SPARQL, FOAF and OWL.

What is linked data?

‘Linked data’ is all about moving from a web of interconnected documents, to a web of interconnected ‘facts’. Think of it like being able to link to and access the relevant individual cells across a range of spreadsheets, rather than just having a list of spreadsheets. It looks a good candidate for being a step-change in the way that people access information over the internet.

What are the implications for journalism and media companies?

For a start it is important to realise that linked data can be consumed as well as published. Tom Heath from Talis gave the example of trying to find out about ‘pebbledash’ when buying a house.

At the moment, to learn about this takes a time-consuming exploration of the web as it stands, probably pogo-sticking between Google search results and individual web pages that may or may not contain useful information about pebbledash. [Image below: secretlondon123 on Flickr]

In a linked data web, finding facts about the ‘concept’ of pebbledash would be much easier. Now, replace ‘pebbledash’ as the example with the name of a company or a person, and you can see how there is potential for journalists in their research processes. A live example of this at work is the sig.ma search engine. Type your name in and be amazed / horrified about how much information computers are already able to aggregate about you from the structured data you are already scattering around the web.

Tom Heath elaborates on this in a paper he wrote in 2008: ‘How Will We Interact with the Web of Data?‘. However, as exciting as some people think linked data is, he struggled to name a ‘whizz-bang’ application that has yet been built.

Linked data at the BBC

The BBC have been the biggest media company so far involved in using and publishing linked data in the UK. Tom Scott talked about their Wildlife Finder, which uses data to build a website that brings together natural history clips, the BBC’s news archive, and the concepts that make up our perception of the natural world.

Simply aggregating the data is not enough, and the BBC hand-builds ‘collections’ of curated items. Scott said ‘curation is the process by which aggregate data is imbued with personalised trust’, citing a collection of David Attenborough’s favourite clips as an example.

Tom Scott argued that it didn’t make sense for the BBC to spend money replicating data sources that are already available on the web, and so Wildlife Finder builds pages using existing sources like Wikipedia, WWF, ZSL and the University of Michigan Museum of Zoology. A question from the floor asked him about the issues of trust around the BBC using Wikipedia content. He said that a review of the content before the project went live showed that it was, on the whole, ‘pretty good’.

As long as the BBC was clear on the page where the data was coming from, he didn’t see there being an editorial issue.

Other presentations during the day are due to be given by John Sheridan and Jeni Tennison from data.gov.uk, Georgi Kobilarov of Uberblic Labs and Silver Oliver from the BBC. The afternoon is devoted to a more practical series of workshops allowing developers to get to grips with some of the technologies that underpin the web of data.

20 thoughts on “How media sites can make use of linked data

  1. Pingback: A history of linked data at the BBC | Journalism.co.uk Editors' Blog

  2. Tom Heath

    Hi Martin,

    I have to admit that the “name the most exciting app” question caught me off-guard, which really it shouldn’t have done; it’s a completely reasonable question, but also somewhat misses the point of my talk.

    With the benefit of hindsight, the answer I would like to have given is this:

    Linked Data applications will slot into one of two main categories: 1) new ways of doing old things, and 2) ways of doing totally new things.

    An example of 1) is mashups based on the Linked Data technology stack. These may be radically quicker, easier and cheaper to create than conventional mashups that have to contend with multiple heterogeneous APIs – take Leigh Dodds’ USA Linked Data Overlays as an example, produced in just a couple of hours: http://www.linkeddata-a-thon.com/index.php/UnitedStatesLinkedDataOverlay – but unless you’re a bean counter or a mashup addict then no-one really cares about the effort invested (or not) if the end result is recognisable.

    Examples of 2) will be much more novel, exciting (to me at least), and probably harder for users to understand in the first instance. Can you remember the first time you used a search engine? I’d bet money that the concept needed some explanation, and that’s for a tool that has some analogue in the pre-Web world.

    We haven’t seen a rash of these revolutionary Linked Data applications yet because we don’t understand enough about what’s missing. As I said in my talk, we’re deeply rooted in the document-centric paradigm, and however powerful your imagination it’s hard to conjure up the unimaginable. This is why we need to explore new thing-centric metaphors for interaction based on Linked Data.

    Being completely honest, examples of 2) may never arise. But, and this is a big ‘but’, we’ve only just started searching for them, and anyone that predicts a plateau in innovation on the Web is bold indeed.

    Cheers,

    Tom.

  3. Pingback: links for 2010-02-24 | raxraxrax.com

  4. Martin Belam

    I agree with you Tom that this is more at the ‘plumbing’ level of the Internet, rather than the ‘visible product’ level. I think that is why it is very hard at the moment to demonstrate business cases around linked data. It strikes me as being a clearly important fundamental layer of the web – although it may be that we discover we’re not quite on the right path yet.

  5. Tom Heath

    Thanks for the reply Martin. A couple of additional comments/clarifications…

    You’re right that Linked Data gives us some new tools for ‘Web plumbing’, although I would hesitate to say categorically that it’s not at the ‘visible product’ level. No, people largely won’t be exposed to the underlying RDF triples, but we are seeing ‘visible products’ emerging built on this technology stack. They may not be revolutionary or ‘whizz-bang’, yet, but they do exist.

    On the subject of business cases I disagree with your comments. Adoption of Linked Data is not predicated on the arrival of revolutionary Linked Data applications, and certainly not on the emergence of new business models.

    Yahoo and Google’s consumption of RDF data to enhance their search results (through the SearchMonkey and Rich Snippets programmes respectively) is evidence of this: those who choose to provide richer, more structured descriptions of their content or products get a kick-back from the search engines in the form of more attractive and detailed listings in results pages. It’s the same old content/product ‘discovery’ business case.

    Not having to define your own API through which to publish your structured data is another business case. Why spend developer effort on implementing support for every new API method requested by your users? Expose your data via a SPARQL endpoint and let the data consumers slice and dice the data any which way they want.

    I’m eagerly awaiting the day when The Guardian adds this capability 😉

  6. John Evans

    Hi Martin,
    nice to meet you yesterday. I too wonder about the business use case for the news industry.
    One area that occurred to me is that it becomes possible to reduce costs when you acknowledge other organisations who are managing data stores for you. Taking an organisation through the process of “what is that data that we bring to the party, and adds value, and what is data that someone else is better and more appropriate at managing”, could lead to a leaner, more efficent organisation than can focus on its core strengths. It also allows the organisation to only be liable or responsible for the core data.

    The idea that the BBC will update Wikipedia with definitions it needs, is like saying “thanks for remotely looking after one of our databases”. Certainly saves on DB server costs!

    regards

    john

  7. Andy Mabbett

    The BBC Wildlife Finder also uses the ‘Species’ microformat:

    http://microformats.org/wiki/species-strawman-01

    to identify the common and scientific name of animals available to – for example – search engines and, through simple browser add-ons, to look them up on other sites, of the user’s choosing.

    It’s in use on Wikipedia, too.

    As the designer of that microformat, I’m happy to answer any questions; or receive any suggestions for its improvement.

  8. Pingback: links for 2010-02-25 « Wha'Happened?

  9. Pingback: Open Thread: What Would You Build With a Web of Data? | Digital Asset Management

  10. Pingback: Los medios y la web semantica | El Blog de eresmadrid

  11. Randy Brickhouse Sr.

    I found the statement about Wikipedia content interesting. I’m a sophomore college student in the state of New Jersey. Every professor and instructor I have had, has told us never to use Wikipedia for our research work, because it was unreliable.

    I never thought to actually look into it, simply because our professors won’t give credit for using Wikipedia as a resource.

    Thanks and God bless.

Leave a Reply