top of page

AI Scrapes Your Data For Training: Take Steps To Protect Your Data

Published: January 09, 2024 on our newsletter Security Fraud News & Alerts Newsletter.

Large language models like ChatGPT have introduced complexity to the evolving online threat landscape. Cybercriminals are increasingly using these models to execute fraud and other attacks without requiring advanced coding skills. This threat is exacerbated by the availability of tools such as bots-as-a-service, residential proxies, CAPTCHA farms, and more. As a result, it's crucial for individuals and businesses to take proactive measures to protect their online presence.

Three key risks associated with large language models (LLMs) like ChatGPT and ChatGPT plugins include content theft, reduced web traffic, and data breaches. They get valuable information by scraping your website. That information is used to train the LLMs, but it also opens you up to cyber threats. Limiting the access to that data from these LLMs can help mitigate the risk of your organization falling victim to these.

While most likely are scraping your data to cause harm, they could inadvertently distribute your sensitive data. Some data scrapers may not distinguish between intended and unintended data sharing, posing risks to businesses' reputation and competitive advantage, should sensitive data be leaked this way. Not to mention the costs involved with recovery from a breach.

According to IBM’s latest Cost of a Data Breach report, the average cost of a breach to an organization was $4.45 million in 2023. That’s a record high. This figure includes consideration for legal and regulatory costs, the cost of technical activities, as well as hits to brand equity, customer loss, and effects on employee productivity. In fact, according to research by the stack management company Encore, a data breach causes employees to lose trust in management and up to 60% leave afterward.

Businesses should consider ways to opt out of having their data used to train LLMs and blocking LLM scrapers, especially in industries where data privacy is critical.

Want to schedule a conversation? Please email us at

bottom of page