Web Scraping & Data Harvesting... What it is and why you should care!

Posted By: Meg Palumbo TAA News & Updates,

Have you ever received an email (like the one below) offering to sell you an event attendee or membership list? This article will give you a little more insight into what they're actually selling and how they get those lists!

What is web scraping? 

Web Scraping is a term used to describe a technique that extracts data (such as email addresses) from websites and saves that data to a local file or to a database in a spreadsheet format.

Who is web scraping?

In terms of the subject of this article, web scrapers are individuals looking to make money off selling lists that may or may not be accurate and they don't have the rights to sell anyway! There are two types of web scrapers (again, in terms of this article) including humans and bots. Humans are actively copying and pasting email addresses from websites into a file or spreadsheet. It can take a human 4 - 5 hours to "harvest" data from a single website. "Bots" are programs created to harvest that same data except it only takes them minutes.

Why is TAA sharing this information with me?

For years, we have always locked down the Owner/Operator contact info so it's only visible if you are logged into your account on our website and you are a current member, but Supplier members have always been visible publicly. For a Supplier member, it's a valuable benefit to be listed within our directory, searchable by members and non-members alike. So, while a majority of our website visitors do not have nefarious intent, because Supplier's email addresses are available to the public, they are subject to be harvested by web scrapers. This is the case for any email address available publicly on any website, not just on TAA's website, but we're doing something about it.

What has TAA already done to help protect my email address?

In collaboration with our web host, we have implemented an extra step to prevent bots from harvesting our Supplier member's emails. Now, when you visit our website and you are not a logged-in member, you'll be prompted to prove you aren't a bot by using a reCaptcha. (See screenshots below.) Keep in mind, this "fix" only prevents bots from harvesting information, it does not prevent humans from completing the reCaptcha and then copying and pasting contact information.


Visitor Experience - Guest, Not Logged In

 

Visitor Experience - Logged In Member


So what's next?

TAA, in conjunction with the Products & Services Committee and with assistance from our web host, will continue to monitor internet security issues and make updates and adjustments as needed. Please do not hesitate to reach out to us for more information or if you have any questions.

Please know that TAA will never use a third-party to attempt to sell you a Member List. We do hire third-party companies to manage contracted services like our online Career Center (Web Scribble) and to publish our Annual Directory (Innovative Publishing). These companies may reach out on our behalf to promote purchasing job postings or purchasing ad space, but never to sell you a Member List.

As always, please feel free to reach out to anyone on TAA Staff to confirm the validity of anyone who says they are contacting you on our behalf.