Social news app Flipboard was yesterday's hot new app, despite—or perhaps because of—technical problems that prevented some features from working. But there might be a bigger snag: Is Flipboard scraping content it doesn't have the rights to?
Flipboard, the new iPad app that renders links from your Twitter feed and favorite sites in a beautiful, magazine-style layout, has a problem: it scrapes websites directly rather than using public RSS feeds, opening it to claims of copyright infringement.
Unlike some similar news apps like Pulse, Flipboard appears to eschew the older syndication standby RSS to instead grab URLs from Twitter and Facebook feeds. While news sources that maintain their own automatic Twitter feeds tend to link the same stories as they do in their RSS feeds, there's one critical difference: RSS also allows content to be included in the feed, whereas Twitter provides only the URLs that link back to the full website. (Unless, of course, the site only writes 140 character news stories.)
Back in the ancient days of the mid-aughts, there was a healthy debate online about whether or not news outlets should provide full content feeds or simply headlines and excerpts. Rather than rehash that debate—one that's still ongoing—just remember this: whether a company chose to publish "full feeds" or excerpts, the choice remained theirs.
A new class of "feed readers" have ditched RSS and built their own content scrapers. The ever-popular Instapaper—the adblocker it's okay to like!—is a scraper: a reader views a story in their web browser (along with ads and other web chrome); clicks "Read Later"; Instapaper uses some sorting magic to figure out what part of the already-downloaded HTML is content and which is cruft.
From a licensing and copyright perspective it's a little bit iffy, but since content providers get at least one pageview every time someone uses Instapaper there has been a sort of truce. (One made more steady by the fact that many of those working in the media who might get frustrated by scrapers are also fans of long-form content—exactly the sort of reader to which Instapaper caters.)
It appears that Flipboard uses a very similar technique to scrape content from the sites that are indexed within the app. A story from CNN.com about Kosovo appeared in my copy of Flipboard under the "FlipNews" section with a headline, picture, and about three-and-a-half paragraphs of content, despite that CNN's feed gives only a headline and a dek (a summary sentence).
I posed these very questions to Flipboard's co-founder Evan Doll:
• Is Flipboard using a custom scraper for content that is pulling content from content websites?
Doll: Flipboard is using a parser that is very similar technically to Readability by Arc90 and Safari 5's Reader feature.
• Does Flipboard use RSS? Or is it all Twitter and Facebook feeds?
Doll: Flipboard does not currently use RSS.
• Is the content passing through Flipboard's servers before being sent to users' iPads?
Doll: The content parser runs on Flipboard's servers. It simply wouldn't be possible to run on the client for reasons of speed and complexity.
• Is the scraper universal/generic, or are there customer scrapers for certain sites?
Doll: It is mostly universal/generic. However, we can limit the amount of content displayed on a site-specific basis. We already try to do this with some sites that publish extremely abbreviated RSS feeds- even though we aren't using RSS directly, we attempt to achieve display parity with their feed.
We're happy to accommodate the requests of any content publisher who want to choose how much content to show in Flipboard, or to hide their site's content altogether.
• What is the rationale for scraping content? I presume you guys are claiming a reasonable system under fair use, but what is Flipboard's policy?
Doll: Flipboard shows short content previews. We do not offer a "full article" view in Flipboard for articles of arbitrary length. If the user wants to read the full article, they tap "Read on Web" and are taken to the full site in an embedded browser.
We see Flipboard a great way to discover content, particularly recommendations from your friends of sources that you may not already subscribe to. As such, we believe that we're providing value to content creators by helping to drive new readers in their direction. As mentioned before, though, we are happy to limit or hide content as requested.
In the past 48 hours, we've received an incredibly positive response from content creators who are happy about being featured in Flipboard, and who want to work with us on doing a better job displaying their content. Hopefully we can do more on this front soon.
Personally, I'm a fan. Flipboard and its ilk are taking ideas that have been percolating for years and turning them into software tools that are very useful for readers.
Unfortunately, those tools may not be, strictly speaking, on the right side of copyright.
Take Boston.com's The Big Picture, for instance: Each of those large images come from wire services. It's content paid for by the Boston Globe, licensed from the photo wires with strict limits on how it can be presented and distributed. The Big Picture's RSS feed presents a single image and a description for each collection.
View The Big Picture from within Flipboard, however, and you'll find all of the images available, scraped—or "parsed", to use Flipboard's phrasing—to Flipboard's own servers, resized into a grid, and then distributed to every single Flipboard app. Fair use is a murky thing, decided on a case-by-case basis, but it's hard for me to suss a scenario where scraping all of the images from a picture blog and redisplaying them without ads or context would be considered fair use.
Moreover, Flipboard is saying they're willing to shut off or rein in how much content they are slurping up if content providers simply ask. That's good, but it's still a possible violation of copyright. Content creators do not have to specifically ask that their copyright not be violated.
And without unfettered access to every single site that could be linked by your friends, Flipboard becomes less and less useful. (It's the same argument used by many—myself included—against pay walls for news sites. Pay walls make it difficult for news to go viral.)
I also spoke to Flipboard's Marci McCue briefly on the phone about the issue. She explained that Flipboard never provides a full version of content, but instead caches the content's web page behind the scenes where it can be read at the touch of a button—ads and all.
In addition, McCue reiterated Doll's claim that Flipboard tries to use RSS as a guideline. "We try to map to RSS rules," said McCue. That's a good notion, but as The Big Picture example proves it isn't foolproof. And more importantly, while it's a good guideline, RSS isn't a license. It is licensed, as a rule, with very specific limitations on how it can be used.
But Flipboard is free. So if they're not making money off of others' content, what's the problem?
The problem is that Flipboard doesn't plan to have zero revenue forever. (Of course.) The company hopes that content companies like Condé Nast will be so pleased at the way Flipboard presents their content that they'll work with the company—subscriptions and revenue sharing were mentioned specifically—instead of suing the pants off of them.
Considering that media companies have each been creating their own magazine apps for iPad to have complete control over layout (and revenue and everything else that comes with selling your own app), that seems like a terribly ambitious plan.
Good luck to them, all the same. Flipboard is another great example of the power of the internet to transform the way we consume media. It just may not be, you know, legal.