Tag Archives: robots.txt protocol

paidContent:UK: Times Online blocks news aggregator Meltwater

Following its decision in January to block the NewsNow news monitoring site, Times Online has blocked fellow news aggregator Meltwater.

Meltwater is the only aggregation service that has not complied with a new system introduced by the Newspaper Licensing Agency (NLA) at the start of 2010, which includes charging sites that crawl newspaper websites and use this information as part of a commercial service to clients.

Meltwater is taking the NLA to a copyright tribunal and on Monday was told its challenge would go ahead with a procedural hearing in June 2010 and a trial in February 2011.

Full story at this link…

ACAP answers its critics

The ACAP project launched in November with the hope of being the technological solution to end clashes between news publishers and search engine over content use.

In addition to the back-slapping and the pomp, the launch brought with it hefty criticism of the new system.

The team behind the project has now attempted to satisfy some of the criticism thrown its way by responding to what it considers the main thrust of the argument against it.

Here is a summary of the main critisms ACAP has singled out and its responses (full list):

Criticism: “Publishers should not be allowed to control their content”

Response: Well, you would hardly expect us to agree with this…

“This is simply a way for publishers to “lock up” their content”

…Publishers who implement ACAP will have the confidence to make content available much more widely than is currently the case. Few would condone stealing a pile of newspapers from a newsstand and giving them away to passers-by for free, yet, there are those who think that this behaviour is completely acceptable – indeed normal – in the online environment…

“Robots.txt works perfectly well”

…We recognise that robots.txt is a well-established method for communication between content owners and crawler operators. This is why, at the request of the search engines, we worked to extend the Robots Exclusion Protocol not to replace it (although this posed us substantial problems)… ACAP provides a standard mechanism for expressing conditional access which is what is now required. At the beginning of the project, search engines made it clear that ACAP should be based on robots.txt. ACAP therefore works smoothly with the existing robots.txt protocol…

“This is just about money for publishers”

No: but no one would deny that it is partly about money.

Publishers are not ashamed about making money out of publishing – that is their business…Business models are changing, and publishers need a tool that is flexible and extensible as new business models arise. ACAP will be entirely agnostic with respect to business models, but will ensure that content owners can adopt the business model of their choice…

“The big search engines aren’t involved so don’t waste your time”

Major search engines are involved in the project. Exalead, the world’s fourth largest search engine has been a full participant in the project.

Any lack of public endorsement by the major search engines has not meant a lack of involvement – indeed, quite the opposite…