That may have been the cry until earlier this month when the latest appeals court decision that web scraping doesn’t break anti-trust laws. LinkedIn lost its two-year legal battle with a private company that it had blocked from its site for allegedly stealing publicly available data from its website.
How Does It
Affect My Organization?
The most susceptible organizations appear to be those with memberships and sites that allow access to proprietary information. Web scrapers can sign up, pay a fee, if necessary, and harvest any information available.
In addition, any
sites that request personal information, from Facebook to Amazon, Craigslist to
YouTube, and Wikipedia, are apparently vulnerable to web scrapers who can
purchase the software on the web.
suggest that the bots used in web scraping can even extract non-pubic data from
sites, especially those that compare prices, goods and other things. Sometimes
bots used to web scrape make too many requests without pausing, This usually
results in a shutdown and denial of service response shutting off access to all
If It’s Legal,
What’s the Harm?
Most of the bots
used in web scraping gather data that can be used to benefit its user. These
include price comparisons, research, product data, web content, and
customer/sales leads, to name a few.
What Can I Do?
Stopping bots and
web scrapers may not be 100% possible but there are some things you can do to
decrease the odds or minimize damage.
Using CAPTCHA or
Completely Automated Turing Test to Tell Computers and Humans Apart can help.
Those are the signs or pictures you sometimes encounter on sites asking you to
identify certain objects or things before allowing you to complete your query.
Using video, pdf
and images is not only useful to visitors but can stump web scrapers. Most bots
are looking for text and miss this content. Consider requiring visitors to log
into your site. It may not stop a web scraper, but the requirement will force
the bot to enter information that will likely enable you to track down the
Talk with your IT
experts about possibly blocking requests from computers that come in much
faster than individuals. However, also be aware than some VPNs and web servers
may show all traffic coming from the same address and could also be blocked. If
so, are you willing to risk that?
As obvious as it
is, don’t post anything on your website that you don’t want to be copied or
repurposed. If you’re with an organization where information about members is
available to other paid members, this is a fact-of-life you have to deal with.
The only real preventative maintenance you can do is implement deeper vetting and screening of membership applications and require those joining to adhere to privacy rules that prohibit the commercial collection and use of member information. Of course, there isn’t much you can do after the fact if someone violates the restriction, but you’ll at least know the source and be able to take appropriate action.
different rulings involving similar cases in two other courts, many observers
believe this issue may be taken to the U.S. Supreme Court. Stay tuned!