By Jackie Glade March 28, 2026 1841 views

The bot situation on the internet is actually worse than you could imagine. Here's why:

As you may know, on Glade Art we tend to take anti-bot measures very seriously; it is one of our topmost priorities to protect our fellow users from having their art trained on. We also tend to engage in trolling bots by using endless labyrinths of useless data to trap them in. These are commonly referred to as "honeypots" or "digital tar pits." And so, after 6.8 million requests in the last 55 days at the time of writing this, we have some substantial data, so standby and let us share it with you. : ) > 1. Quick clarification. For starters, these bots do not obey robots.txt. This is expected from unethical companies, but it doesn't make it any better. (A robots.txt file is a plain txt file placed in websites which contains rules of where bots are allowed and disallowed to go. Good bots such as search engine crawlers obey these rules, while bad bots do not). To avoid trapping good bots, we have our robots.txt set to disallow all bots from going into this site's tar pits. > 2. Pages and Contents. The 2 traps on this site which have the most bot activity are these: gladeart(DOT)com(SLASH)data-export (Over 6.8 million requests in the past 55 days). gladeart(DOT)com(SLASH)gro (Over 84k requests in the past 35 days). (NOTE: Use a VPN on these pages if you don't want your IP shown in the logs, but it won't be significant amongst the millions of others anyways). As you can see when visiting the pages, GRO generates more book-like text, while Data Export's text is well... whatever it's supposed to be. Data Export is by far more successful than GRO. It would be safe to assume that these companies are scraping for more number-rich data for better facts and stuff. Fake personal information such as emails or phone numbers seem to also attract scraping very well. > 3. Characteristics of these bots. The IPs of these bots here actually do not come from datacenters or VPNs most of the time; the overwhelming majority come from residential and mobile networks. Asian and Indonesian countries are where nearly all of them reside. By leveraging cheap compute from such countries while using residential IPs, they can appear as completely human traffic to many websites, and scrape at massive scale. However, there is some good news: these bots do not execute JavaScript, at least not when scraping random sites across the entire web. Just imagine the compute costs if they couldn't use headless browsers while scraping millions of sites every hour! This makes PoW challenges extremely effective against them. Website traffic at these scales coming from bots while looking like normal humans begs this question: "How much of the internet's traffic comes from bots?" > 4. How much of the traffic on the internet comes from bots? Reports in 2024 say that approximately 51% of all traffic on the internet comes from bots. Now this sounds like a lot, and it is, but it is much worse than that. This is because these estimate rely heavily on where the IP addresses originate from: whether they come from datacenters or not. As we can see in our data, there is an extremely high amount of bots that don't come from datacenters at all. They can certainly be rigged to execute JavaScript on high quality sites, and many sites don't even require JS, such as Wikipedia and Old Reddit. With this in mind, it wouldn't be unreasonable to assume that the amount of bot traffic on the internet is much higher, perhaps over 70% even. > 5. Some experiments on these bots. Of course we ran some experiments on these bots. Quick fact: Anubis is a program that adds a proof of work challenge to websites before users can access them. And so Anubis was enabled in the tar pit at difficulty 1 (lowest setting) when requests were pouring in 24/7. Before it was enabled, it was getting several hundred-thousand requests each day. As soon as Anubis became active in there, it decreased to about 11 requests after 24 hours, most just from curious humans. Was it a coincidence? No, it was not. It was tested on several other occasions yielding very similar results. As this confirms, bots do not like PoW challenges, even ultra easy ones. If a few do execute JS, extremely little will solve challenges; take the search engine crawler GoogleBot for example. > 6. Who are these bots from? These bots are almost certainly scraping data for AI training; normal bad actors don't have funding for millions of unique IPs thrown at a page. They probably belong to several different companies. Perhaps they sell their scraped data to AI companies, or they are AI companies themselves. We can't tell, but we can guess since there aren't all that many large AI corporations out there. > 7. How can you protect your sites from these bots? If your site has a vast amount of pages, then these bots could potentially raise resource usage for your server when they are crawling through everything. The best options in this case would be Cloudflare or Anubis. Alternatively, you could add a simple JS requirement in your web-server, Nginx for example, (this won't be as effective, but often sufficient for most sites). It would be recommended to add an hCaptcha to forms such as sign ups and similar as well. Overall, a correctly configured Anubis on your site eliminates nearly all bot traffic. > 8. Server resource usage. Our server usage for the tar pit endpoints is quite low. For example, when a global 1000 request per minute rate-limit was being reached in Data Export, the server's CPU usage was not noticeably higher than when idle (i5 4460). The ram usage for it was also very low, much less than 500mb. And since it's just text data being sent out, uploads were no more than 700KiB/s. > 9. Fun fact. So on average, the Data Export tar pit generates 9000 characters per request. Doing the math on that makes the 6.8 million loads equivalent to ~52 billion characters, or over 120,000 novels worth of text generated and sent in total since Jan 29th, 2026. > 10. Download a log file. Here is a massive log file for some activity in the Data Export tar pit: https://mega.nz/file/69Rh3IpS#ThlagHz8e58jLvU-vWn9U9m9T_WegL4SE0H2mhZRcZY Caution: this file decompresses to about 1.1GB. Standard text editors will struggle to open it. Note: this file contains logs from Jan 29th to March 22nd, 2026. [This is for educational purposes only]. <> Outro. And so, with this information we can see just how bad the bot situation is right now on the internet. Look on the bright side though, trolling bots is fun! We recommend you to add your own tar pits to your site as well; the more volume the better. Just be sure to disallow going into there in your robots.txt so that good bots don't get trapped. Bad bots actually often go into that page because you disallowed it for them. Thank you for reading! : ) <>


← Back to Blog