Category Archives: Handy tools and technology

Knight News Challenge winner DocumentCloud releases ‘CloudCrowd’ system

DocumentCloud, the New York Times and ProPublica-backed project, has released its first open-source code since its launch.

The project, which won funding from the 2009 Knight News Challenge, was created to make documents and data useable for anyone. It will include software, a website and a set of open standards to make original source documents easy to find, share, read and collaborate on. From its site:

“Users will be able to search for documents by date, topic, person, location, etc. and will be able to do ‘document dives’, collaboratively examining large sets of documents. Organisations will be able to do all this while keeping the documents -and readers – on their own sites. Think of it as a card catalogue for primary source documents.”

DocumentCloud is not a collection of documents; rather software to support documents hosted elsewhere, two of the team – Eric Umansky, senior editor at ProPublica and Aron Pilhofer, the New York Times newsroom interactive technologies editor – explained to Journalism.co.uk in June.

The new system announced this week – CloudCrowd – will work as ‘a heavy-duty system for document processing’, in particular for importing large documents for use with DocumentCloud, the project’s lead programmer Jeremy Ashkenas said.

“Our PDFs need to have their text extracted, their images scaled and converted, and their entities extracted for later cataloguing,” he explained, adding more detail about the process, which is called ‘parallel processing’ on its site.

“All of these things are computationally expensive, keeping your laptop hot and busy for minutes, especially when the documents run into the hundreds or thousands of pages.”

The parallel processing system, named CloudCrowd, will power DocumentCloud’s document import, a process described in detail on its site by the project’s lead programmer Jeremy Ashkenas.

Ashkenas encouraged other users with ‘batch-processing needs’  who need to process large number of documents to try the system. It fits into the project’s community ethos; the aim is to invite participation and feedback ‘from scaffold to deploy’.

CloudCrowd links:

Linking data and journalism: what’s the future?

On Wednesday (September 9), Paul Bradshaw, course director of the MA Online Journalism at Birmingham City University and founder of HelpMeInvestigate.com, chaired a discussion on data and the future of journalism at the first London Linked Data Meetup. This post originally appeared on the OnlineJournalismBlog.

The panel included: Martin Belam (information architect, the Guardian; blogger, Currybet; John O’Donovan (chief architect, BBC News Online); Dan Brickley (Friend of a Friend project; VU University, Amsterdam; SpyPixel Ltd; ex-W3C); Leigh Dodds (Talis).

“Linked Data is about using the web to connect related data that wasn’t previously linked, or using the web to lower the barriers to linking data currently linked using other methods.” (http://linkeddata.org)

I talked about how 2009 was, for me, a key year in data and journalism – largely because it has been a year of crisis in both publishing and government. The seminal point in all of this has been the MPs’ expenses story, which both demonstrated the power of data in journalism, and the need for transparency from government. For example: the government appointment of Sir Tim Berners-Lee, the search for developers to suggest things to do with public data, and the imminent launch of Data.gov.uk around the same issue.

Even before then the New York Times and Guardian both launched APIs at the beginning of the year, MSN Local and the BBC have both been working with Wikipedia and we’ve seen the launch of a number of startups and mashups around data including Timetric, Verifiable, BeVocal, OpenlyLocal, MashTheState, the open source release of Everyblock, and Mapumental.

Q: What are the implications of paywalls for Linked Data?
The general view was that Linked Data – specifically standards like RDF [Resource Description Format] – would allow users and organisations to access information about content even if they couldn’t access the content itself. To give a concrete example, rather than linking to a ‘wall’ that simply requires payment, it would be clearer what the content beyond that wall related to (e.g. key people, organisations, author, etc.)

Leigh Dodds felt that using standards like RDF would allow organisations to more effectively package content in commercially attractive ways, e.g. ‘everything about this organisation’.

Q: What can bloggers do to tap into the potential of Linked Data?
This drew some blank responses, but Leigh Dodds was most forthright, arguing that the onus lay with developers to do things that would make it easier for bloggers to, for example, visualise data. He also pointed out that currently if someone does something with data it is not possible to track that back to the source and that better tools would allow, effectively, an equivalent of pingback for data included in charts (e.g. the person who created the data would know that it had been used, as could others).

Q: Given that the problem for publishing lies in advertising rather than content, how can Linked Data help solve that?
Dan Brickley suggested that OAuth technologies (where you use a single login identity for multiple sites that contains information about your social connections, rather than creating a new ‘identity’ for each) would allow users to specify more specifically how they experience content, for instance: ‘I only want to see article comments by users who are also my Facebook and Twitter friends.’

The same technology would allow for more personalised, and therefore more lucrative, advertising. John O’Donovan felt the same could be said about content itself – more accurate data about content would allow for more specific selling of advertising.

Martin Belam quoted James Cridland on radio: ‘[The different operators] agree on technology but compete on content’. The same was true of advertising but the advertising and news industries needed to be more active in defining common standards.

Leigh Dodds pointed out that semantic data was already being used by companies serving advertising.

Other notes
I asked members of the audience who they felt were the heroes and villains of Linked Data in the news industry. The Guardian and BBC came out well – The Daily Mail were named as repeat offenders who would simply refer to ‘a study’ and not say which, nor link to it.

Martin Belam pointed out that the Guardian is increasingly asking itself ‘how will that look through an API?’ when producing content, representing a key shift in editorial thinking. If users of the platform are swallowing up significant bandwidth or driving significant traffic then that would probably warrant talking to them about more formal relationships (either customer-provider or partners).

A number of references were made to the problem of provenance – being able to identify where a statement came from. Dan Brickley specifically spoke of the problem with identifying the source of Twitter retweets.

Dan also felt that the problem of journalists not linking would be solved by technology. In conversation previously, he also talked of ‘subject-based linking’ and the impact of SKOS [Simple Knowledge Organisation System] and linked data style identifiers. He saw a problem in that, while new articles might link to older reports on the same issue, older reports were not updated with links to the new updates. Tagging individual articles was problematic in that you then had the equivalent of an overflowing inbox.

Finally, here’s a bit of video from the very last question addressed in the discussion (filmed with thanks by @countculture):

Linked Data London 090909 from Paul Bradshaw on Vimeo.

Resources:

You must not embed the Telegraph’s embeddable video

It might look like you can embed this Telegraph video on your blog:

telegraphembed

But no: please take note of the last part.

As both Journalism.co.uk and Fred Hatman, a journalist in South Africa found out, embed codes are only for ‘personal use’. That didn’t include Hatman (@fredhatman) even though he is a lone blogger.

Instead, we had to feature the story of the Telegraph journalist who was attacked by a lion after willingly entering its enclosure (mauling received surprisingly cheerfully) without the accompanying video. We got permission to link though!

Syndication@telegraph.co.uk informs us:

“I’m afraid at this time we can’t grant permission for you to host the video, but you are welcome to link to it.”

So we asked them why they supplied the code? And how could we fulfil the requirements for a licence? They replied:

“My understanding is that this function is for personal use only, not for commercial use, as per our terms and conditions.  Often we are able to issue a licence for the content, but on this occasion Telegraph.co.uk are not offering this video for web syndication.”

Journalism.co.uk wonders how Telegraph.co.uk will monitor and police misuse of the videos – if abuse was extensive. Or how they decide who is commercial and who is not? If, as the Syndication people tell us, ‘on this occasion Telegraph.co.uk are not offering this video for web syndication’ why bother supplying it at all? Isn’t that just asking for trouble?

The Apple Blog: Protect your laptop – disguise it as a newspaper

A review from last week of several laptop covers that disguise you device (in this case an Apple MacBook) as a newspaper.

There’s a choice of five different newspaper sleeves – from the Herald Tribune to La Vanguardia.

“Potential flaws in the strategy might be greater risk of misplacing the sleeve with MacBook in situ, or the more horrific possibility of an over-zealous cleaner-upper including the faux newspaper with a pile of real newspapers headed for recycling or the landfill,” writes reviewer Charles Moore.

Full post at this link…

[Old media enveloping new media? The fate of the newspaper as an anti-theft device? There are almost too many allusions here…]

EnvironmentGuardian.co.uk’s makeover

A new look for  Guardian.co.uk’s environment pages was unveiled today, with the promise of more editorial content from its six correspondents.

“The Guardian has built this unrivalled team in the belief that environmental issues, and in particular global warming, is the defining issue of our age, combining politics, economics and social justice,” said James Randerson, editor of EnvironmentGuardian.co.uk, in a release from Guardian News & Media.

“We hope that all of the new features on the site – together with the enthusiastic participation of our visitors – will serve as an invaluable resource for anyone wanting to understand the context behind the headlines.”

Expert correspondents now include one in Washington DC, one in China and one dedicated to green technology, the release said.

Also announced:

  • A new video series featuring the Observer columnist Lucy Siegle
  • To mark the UN climate talks in Copenhagen in December, the foreign secretary David Miliband will answer users’ questions in a live online Q&A at lunchtime on Tuesday (September 8, 2009) – time to be confirmed. Rachat de voiture, vendre voiture moteur hs

Randerson is asking for user feedback at this link.

Signals intelligence journalism: using public information websites to source stories

Useful information is more widely and easily available than ever and the increasing amount of online data released by the government and others can help improve the originality of journalists’ work.

Look to VentnorBlog – the hyperlocal online effort based in the Isle of Wight which Journalism.co.uk commended during the Vestas protest coverage – for some inspiration.

[For those unfamiliar with the story, locals had been protesting against the closure of the wind turbine factory in front of national, local and hyperlocal media. Despite a long and well-publicised campaign in August 2009, Danish company Vestas has now pulled out of manufacturing on the Isle of Wight but protests and attacks by critics in the press continue. A national day of action to support redundant Vestas workers has been planned for Thursday, September 17.]

Last week, using the Area Ship Traffic Website, AIS, VB was able to report where two barges held by an agent – NEG  Micron Rotors – who used to own the Vestas’ factory were due to head. They would be used to move the blades from the factory, which are so huge that they can only travel away on the water on special vessels.

The correspondent who tipped off VentnorBlog knew that the wind turbine blades can only be transferred from the riverside to barge when it is high tide and across a public footpath so, using the information on the AIS site, concluded that the barges would be moved in a specific time slot.

As a result Vestas protesters asked supporters to join them at the Marine Gate on the River Medina. Of course VentnorBlog got down there to take some pictures.

Now let’s take that one step further: how can journalists tap into this kind of publicly available data to scoop stories?

Tony Hirst, Open University academic, Isle of Wight resident and prolific data masher, shared some thoughts with Journalism.co.uk. He said that we should look to signals intelligence for further inspiration: the interception and analysis of ‘signals’ emitted by whoever you are surveying. As military historians would be the first to tell you, they can be a very rich source of intelligence about others’ actions and intentions, he explained.

“A major component of SIGINT is COMINT, or Communications Intelligence, which focuses on the communications between parties of interest. Even if communications are encrypted, Traffic Analysis, or the study of who’s talking to whom, how frequently, at what time of day, or  – historically – in advance of what sort of action, can be used to learn about the intentions of others.”

And this is relevant to journalists, he added:

“For starters, data is information, or raw intelligence. The job of the analyst, or the data journalist, is to identify signals in that information in order to identify something of meaning – ‘intelligence’ about intentions, or ‘evidence’ for a particular storyline.

The VentnorBlog story, he said, describes how a ‘sharp-eyed follower of movements at the plant’ knew where two barges were headed and at what time – valuable journalistic information:

“Amid the mess of Solent shipping information was a meaningful signal relating to the Vestas story – the movement of the barge that takes wind turbine blades from the Vestas factory on the Isle of Wight to the mainland.”

Do you have suggestions for sources of ‘signals intelligence’ journalism? Or examples of where it has been done well?

HSJ: A Yahoo pipe for health-related news

Health Service Journal’s acute care correspondent, Dave West, has created a tool for searching through the BBC’s Today programme for health-related content.

Built using Yahoo Pipes, West is encouraging others to open up the pipe and help make it more efficient.

Full post at this link…

(Found via Martin Stabe’s blog)

Advice from Guardian.co.uk’s online journalism Q&A

On Friday Journalism.co.uk took part in a live Q&A  hosted by the The Guardian’s careers section, allowing new and experienced journalists the opportunity to ask industry professionals for advice on conquering the world of online journalism.

The multimedia panel on hand to answer questions were:

Paul Gallagher, head of online editorial, Manchester Evening News
Laura-Jane Filotrani, site editor, Guardian Careers
Sarah Hartley, digital editor, The Guardian
Alison Gow, executive editor, digital, Liverpool Echo and Liverpool Daily Post
Laura Oliver, senior reporter, Journalism. co.uk
Madeline Bennett, editor of technology news sites V3.co.uk and The Inquirer
Paul Bradshaw, senior lecturer in online journalism, Birmingham City University
John Hand, duty editor, UK desk BBC News website
Alison White, community moderator, The Guardian

Here’s our round-up of the best advice from Friday’s event on how to make it as a successful online journalist in the digital age. You can also read the panel’s responses in full on the online journalism Q&A page on Guardian.co.uk.

Jump to:

What is the best subject to study to help me break into journalism?

[asked by Matt, who is studying English literature and language at college and asked if going on to study an English degree would help him prepare for a career in journalism]

John Hand: “I’m often asked which is the best subject to study at university and the answer is really that there is no particularly bad choice. The best newsroom has a good mix of people with different knowledge areas – for example, I think every editor in the country would love to have someone with the in-depth health knowledge of a medical degree on their team. Of course, any degree course that allows you to develop your writing and analytical skills (I always think history is a clever choice) would be better than most.

“The most important thing is to get some vocational training. Many editors themselves initially came through NCTJ courses (http://www.nctj.com/) so would respect those, but there are also many media organisations that offer their own in-house (or even external) training. If you want to get into news journalism, the key question to ask of any training scheme is how good their law course is.”

Sarah Hartley: “Grab as much work experience as you can throughout your uni years. Who knows what the economic climate will be like when you graduate but it may well be that you can find an employer who will put you through a block release course or similar. New schemes for apprenticeships, internships and such are bound to come through in that time.”

Madeline Bennett: “Has your college got a student newspaper or website? If so, volunteering to write for that would be a good starting point and showcase for your work. If not, why not start one? This is also the case for when you go to uni, student papers can be a great place to launch your journalism career.”

But what if I can’t afford to go to university?

[Forum user Dan Holloway asked: how does someone who has no choice but carry on a full-time job to make ends meet go about switching careers to online journalism?]

Alison White: “My advice would be to perhaps take some evening classes in journalism if possible – while I was at uni I did a 10-week course, one evening a week, about freelancing and a two-day course about getting into journalism. Or how about some work experience? Newspapers and other organisations are less well-staffed at weekends, I’m sure they’d appreciate some help with uploading content or other duties. Once you’ve got to know some people you can always keep in touch in the hope they might point you towards job opportunities or further work experience.”

Madeline Bennett: “Look for courses that focus on online journalism or multimedia skills, there might be some weekend or evening classes available that you can do to support your NCTJ. Also these courses are a good place to meet people who can help you get your first job in journalism, as they’ll often be run by current working journalists.”

Laura Oliver: “Start experimenting – if you can find the time outside of work to run a blog, contribute to other websites, you’ll learn a great deal about the basics of online publishing. Contact sites and other blogs that interest you and offer postings. Look at successful bloggers and think about what they are doing that makes them influential/profitable. Here are a couple of posts that might help too regarding building an online brand as a journalist:

“http://blogs.journalism.co.uk/editors/2009/08/17/adam-westbrook-6×6-branding-for-freelance-journalists/

“http://www.journalism.co.uk/5/articles/534896.php

What skills do I need to be an online journalist?

[Forum user Dean Best asked: what are the top online-specific skills I should attain to improve my online skills and better my chances of moving up the ladder?]

Laura-Jane Filotrani: “To be able to demonstrate a passion for digital – by this I mean that you are active online; you use the net; you have a profile online; you use and understand community; you are excited by being able to reach people using the internet; you want to find out the latest developments.”

Alison White: “A good knowledge of SEO and the importance of linking to others and providing ‘added value’ to the reader; i.e. give them the story but perhaps with a link to a video, an online petition, a Facebook page etc. News to me seems more of a package now rather than a traditional delivery.”

Paul Bradshaw:

“1. Understand how RSS works and how that can improve your newsgathering, production and distribution. I cover a little of that in this post:

“http://onlinejournalismblog.com/2008/04/21/rss-social-media-passive-aggressive-newsgathering-a-model-for-the-21st-century-newsroom-part-2-addendum/

“2. Engage with online communities around your specialist area, help them, provide valuable information and contacts, and then when you need help on something, they’ll be there for you in return. It will also build a distribution network for your content.

“3. Possibly hardest, but force yourself to experiment and make mistakes with all sorts of media. If you can make yourself entertaining as well as informative then that can really work very well.”

How can I make the transition to online journalism?

[‘Malini’ asked: how do I go about breaking into the field of online journalism? And why would anyone pay and retain a writer when they can easily get so much content for free?]

Paul Bradshaw: “Use free writing to build a reputation and contacts; and sell the valuable stuff that you generate from that. Ultimately you should aim to become reliable enough for them to want to hire you when they are hiring.”

Sarah Hartley: “Writers have always provided free content – be it letters to the editor, local band reviews, poetry or whatever, so being online will only further the opportunity for that sort of exposure and that can only be a good thing for diversity and choice.”

Paul Gallagher: “I have taught myself some coding skills like HTML and I believe it does help a lot to have some technical knowledge, not necessarily because you will need them in the job but because it really helps to be able to communicate well with the programmers and developers in your company.”

Editor&Publisher: DailyMe’s Newstogram follows readers’ ‘tastes’

News aggregation site DailyMe has launched ‘Newstogram’ – a new piece of tech that analyses the reading behaviour of users.

The idea is that publishers will be able to use this information to serve up personalised news recommendations based on a user’s individual interests.

This basic function will be free to publishers – more complex use of the data will require signing up to DailyMe’s applications.

Full story at this link…