Tag Archives: Linked Data

Why the US and UK are leading the way on semantic web

Following his involvement in the first Datajournalism meetup in Berlin earlier this week, Martin Belam, the Guardian’s information architect, looks at why the US and UK may have taken the lead in semantic web, as one audience member suggested on the day.

In an attempt to try and answer the question, he puts forward four themes on his currybet.net blog that he feels may play a part. In summary, they are:

  • The sharing of a common language which helps both nations access the same resources and be included in comparative datasets.
  • Competition across both sides of the pond driving innovation.
  • Successful business models already being used by the BBC and even more valuably being explained on their internet blogs.
  • Open data and a history of freedom of information court cases which makes official information more likely to be made available.

On his full post here he also has tips for how to follow the UK’s lead, such as getting involved in hacks and hackers type events.

Currybet: BBC News redesign demotes external linking

Following a series of posts looking at external linking on news sites, Guardian information architect Martin Belam uses the BBC as a case study for outlining how the company has experimented with linking over the years.

In an earlier post, he outlined cases in which news publishers should ensure links are included:

There are several clear use cases where additional links on news stories should be added as a matter of course, though – stories that reference medical or scientific reports, stories that reference published consultation papers, stories where quotes and pictures are sourced directly from the web, and stories specifically about websites.

In the latest post on the topic, he discusses how links on the BBC’s news website today appear a low priority.

So far, with the recent BBC News redesign, it remains the fact that external links are kept away from the body of an article. Actually, arguably they have been demoted, since whereas they used to appear in the side panel of a story, they now appear at the foot of the page.

Earlier this month, director of BBC Future Media and Technology Erik Huggers pledged to double the amount of external linking by BBC News Online.

See his full post here…

MediaShift: Why news organisations should use ‘linked data’

Director of the Media Standards Trust Martin Moore gives 10 reasons why news organisations should use “linked data” – “a way of publishing information so that it can easily – and automatically -be linked to other, similar data on the web”.

[Moore’s recommendations follow the News Linked Data Summit and you can read more about the event at this link.]

It’s worth reading the list in full, but some of the top reasons include:

  • Linked data can boost search engine optimisation;
  • It helps you and other people build services around your content;
  • It helps journalists with their work:

As a news organisation publishes more of its news content in linked data, it can start providing its journalists with more helpful information to inform the articles they’re writing. Existing linked data can also provide suggestions as to what else to link to.

Full post at this link…

A history of linked data at the BBC

Martin Belam, information architect for the Guardian and CurryBet blogger, reports from today’s Linked Data meet-up in London, for Journalism.co.uk.

You can read the first report, ‘How media sites can use linked data’ at this link.

There are many challenges when using linked data to cover news and sport, Silver Oliver, information architect in the BBC’s journalism department, told delegates at today’s Linked Data meet-up session at ULU, part of a wider dev8d event for developers.

Initally newspapers saw the web as just another linear distribution channel, said Silver. That meant we ended up with lots and lots of individually published news stories online, that needed information architects to gather them up into useful piles.

He believes we’ve hit the boundaries of that approach, and something like the data-driven approach of the BBC’s Wildlife Finder is the future for news and sport.

But the challenge is to find models for sport, journalism and news

A linked data ecosystem is built out of a content repository, a structure for that content, and then the user experience that is laid over that content structure.

But how do you populate these datasets in departments and newsrooms that barely have the resource to manage small taxonomies or collections of external links, let alone populate a huge ‘ontology of news’, asked Silver.

Silver says the BBC has started with sport, because it is simpler. The events and the actors taking part in those events are known in advance. For example, even this far ahead you know the fixture list, venues, teams and probably the majority of the players who are going to take part in the 2010 World Cup.

News is much more complicated, because of the inevitable time lag in a breaking news event taking place, and there being canonical identifiers for it. Basic building blocks do exist, like Geonames or DBpedia, but there is no definitive database of ‘news events’.

Silver thinks that if all news organisations were using common IDs for a ‘story’, this would allow the BBC to link out more effectively and efficiently to external coverage of the same story.

Silver also presented at the recent news metadata summit, and has blogged about the talk he gave that day, which specifically addressed how the news industry might deal with some of these issues:

How media sites can make use of linked data

Martin Belam, information architect for the Guardian and CurryBet blogger, reports from today’s Linked Data meet-up in London, for Journalism.co.uk.

The morning Linked Data meet-up session at ULU was part of a wider dev8d event for developers, described as ‘four days of 100 per cent pure software developer heaven’. That made it a little bit intimidating for the less technical in the audience – the notices on the rooms to show which workshops were going on were labelled with 3D barcodes, there were talks about programming ‘nanoprojectors’, and a frightening number of abbreviations like RDF, API, SPARQL, FOAF and OWL.

What is linked data?

‘Linked data’ is all about moving from a web of interconnected documents, to a web of interconnected ‘facts’. Think of it like being able to link to and access the relevant individual cells across a range of spreadsheets, rather than just having a list of spreadsheets. It looks a good candidate for being a step-change in the way that people access information over the internet.

What are the implications for journalism and media companies?

For a start it is important to realise that linked data can be consumed as well as published. Tom Heath from Talis gave the example of trying to find out about ‘pebbledash’ when buying a house.

At the moment, to learn about this takes a time-consuming exploration of the web as it stands, probably pogo-sticking between Google search results and individual web pages that may or may not contain useful information about pebbledash. [Image below: secretlondon123 on Flickr]

In a linked data web, finding facts about the ‘concept’ of pebbledash would be much easier. Now, replace ‘pebbledash’ as the example with the name of a company or a person, and you can see how there is potential for journalists in their research processes. A live example of this at work is the sig.ma search engine. Type your name in and be amazed / horrified about how much information computers are already able to aggregate about you from the structured data you are already scattering around the web.

Tom Heath elaborates on this in a paper he wrote in 2008: ‘How Will We Interact with the Web of Data?‘. However, as exciting as some people think linked data is, he struggled to name a ‘whizz-bang’ application that has yet been built.

Linked data at the BBC

The BBC have been the biggest media company so far involved in using and publishing linked data in the UK. Tom Scott talked about their Wildlife Finder, which uses data to build a website that brings together natural history clips, the BBC’s news archive, and the concepts that make up our perception of the natural world.

Simply aggregating the data is not enough, and the BBC hand-builds ‘collections’ of curated items. Scott said ‘curation is the process by which aggregate data is imbued with personalised trust’, citing a collection of David Attenborough’s favourite clips as an example.

Tom Scott argued that it didn’t make sense for the BBC to spend money replicating data sources that are already available on the web, and so Wildlife Finder builds pages using existing sources like Wikipedia, WWF, ZSL and the University of Michigan Museum of Zoology. A question from the floor asked him about the issues of trust around the BBC using Wikipedia content. He said that a review of the content before the project went live showed that it was, on the whole, ‘pretty good’.

As long as the BBC was clear on the page where the data was coming from, he didn’t see there being an editorial issue.

Other presentations during the day are due to be given by John Sheridan and Jeni Tennison from data.gov.uk, Georgi Kobilarov of Uberblic Labs and Silver Oliver from the BBC. The afternoon is devoted to a more practical series of workshops allowing developers to get to grips with some of the technologies that underpin the web of data.