The Register reports how over a 30-day period in October and November 2009, more than 75,000 websites ‘reused’ at least one newspaper article without sharing revenue with the publisher, according to a new study by the Fair Syndication Consortium. Google, one of the sites assessed, accounted for 53 per cent of the ad revenue attracted by such ‘unlicensed content’. Yahoo accounted for 19 per cent.
“On the 75,195 sites fingered, the Consortium found 112,000 ‘near-exact copies’ of unlicensed articles (meaning reproductions that lifted more than 80 percent of the original article and more than 125 words) and 163,173 ‘excerpts’ (less than 80 per cent of original article and more than 125 words). But in most cases, sites are merely reusing the headline (125 words or less).”
A judge ruled that the ‘snippet’, which occured in the results of a search for ‘Zwartepoorte’ and ‘bankrupt’, may or may not give the false impression that the car dealer has gone bankrupt.
“To create the snippet, Google algorithms pulled both the ‘Zwartepoorte’ bit and the ‘bankrupt’ bit from the Miljoenhuizen.nl page. But they weren’t side-by-side on the page – as the ellipses indicate. That’s often how Google does things. If you Google two separate words, it shows you that each search result contains both of them,” explains the Register.
The site has reportedly removed the page.
Crucially, as the Register points out, the car dealer chose to sue the website not Google. (Interestingly now when you search for the same terms on Google blogs and news sites reporting the case appear in the results with the same snippet).
It’s a worrying precedent for online publishers – are there ways to prevent Google from summarising pages in this form?
The Register reports that civil rights group Liberty has ‘rubbished’ a story in Sunday’s Observer, which said that the group was approached by several UK mobile operators in an attempt to win its public support for their data protection policies. Liberty told The Register that if such an approach had been made it would have been rejected on principle.