Last week we reported on DocumentCloud’s new partner, Thomson Reuters and its long list of ‘beta-testers’ including one from the UK – the Centre for Investigative Journalism (CIJ) based at City University, London.
To re-cap, DocumentCloud is a an open-source platform to make data more easily accessible, pointing users to documents hosted elsewhere, similar to a card cataloguing system or search engine. Only in rare circumstances will DocumentCloud serve the documents itself.
We asked one of its founders, Scott Klein, about the next steps for the project, a winner of the Knight News Challenge 2009.
So why use Thomson Reuter’s OpenCalais?
[SK]”OpenCalais will, as documents are entering our system, find ‘entities’ (people, places, organisations) in them and hand them back to our servers as machine-readable swath of information, which we’ll store and index, and make available for people to query. The process will happen in real-time, and will be a big part of how we relate documents to each other.”
Will you look to partner other large organisations like Thomson Reuters?
“Yes, definitely. We intend to rely heavily on Amazon’s Web Services infrastructure – namely, their Elastic Computing Cloud and Elastic Block Store services, and Amazon has been very enthusiastic about working with us.
“As for other partners, we have a wish list of companies and technologies we think would work well with DocumentCloud. But we’re also happy to talk to anybody who is interested in contributing technology. We don’t imagine that we have all the answers or that we have to invent everything that goes into this.”
What’s next in the development / collaboration pipeline?
“[As reported by Journalism.co.uk] A few weeks ago, we released under an open-source license a major component of our document processing system, an easy-to-use parallel-processing framework for Ruby on Rails called CloudCrowd. Next we’ll start tackling other big components, such as the hosting infrastructure and user interface.”
Will you be hiring any more staff – we see you’ve appointed your lead programmer?
“Yes, we’re on the hunt for some contract staff to work on building out our infrastructure, and on our visual design/user experience.”
- DocumentCloud aims to release a public beta in March 2010
- Tool of the week for journalists – DocumentCloud, to analyse documents as data
- Knight News Challenge winner DocumentCloud releases ‘CloudCrowd’ system
- New York Times/ProPublica’s DocumentCloud makes newspaper debut
- Thomson Reuters acquires US banking analytics site Highline Financial