- > The LLM companies are not picking on me in particular, they are pounding every site on the net.
Why is not this a criminal offense? They are hurting business for profit (or for higher valuation as they probably have no profit at all).
Why are corporations allowed to do with impunity what could land even a teenager years in prison? Is there no rule of law anymore?
The five-year and ten-year penalties kick in only when the government can show the offense caused at least $5,000 in losses across all victims during a one-year period.
https://legalclarity.org/what-are-the-punishments-for-a-ddos...
by davidsojevic
2 subcomments
- I suspect part of the issue is that people are still using things like `acme.com` and `demo.com` as an example domain in their documentation and tests instead of relying on `example.com` which is reserved exactly for this purpose [0]
[0]: https://www.iana.org/domains/reserved
- Bot traffic is crazy even for smaller sites, but still manageable. I was getting 2,000 visitors a day on my infrequently updated website, but after I blocked all the bots via Cloudflare it went back to the normal double digit visitor count.
- > I closed port 443
> Now closing https service is obviously just a temporary fix
Probably the best starting point would be to edit the robots.txt file and disallow LLM bots there.
Currently the file allows all bots: http://acme.com/robots.txt
by kristianp
3 subcomments
- > Nearly all of them were for non-existent pages.
Do any webservers have a feature where they keep a list in memory of files/paths that exist?
- The only real solution is to put Anubis in front. For me, I just use Cloudflare in front and that suffices. But it's only a few thousand per hour by default. My homeserver can handle that quite well on its own.
- For those who have deployed Cloudflare in front, what are pros and cons? How's the user experience? Do they offer free bot protection?
- This reminds me of a problem we hit at work. Ended up going a different direction but same root issue.
- You can block CN, RU, SG, KR, and the level 3 from "ipsum" and the numbers go down a lot.
People might not know about ipset - dont use individual rules in iptables.
Nginx can reject easily based on country.
geoip2 /etc/GeoLite2-Country.mmdb {
$geoip2_metadata_country_build metadata build_epoch;
$geoip2_data_country_code default=Unknown source=$remote_addr country iso_code;
}
map $geoip2_data_country_code $allowed_country {
default yes;
KR no;
SG no;
CN no;
RU no;
}
server {
....
if ($allowed_country = no) {
return 444;
}
}
- i had to block all traffic, except that of my country. As i offer a service that is exclusive to my country it worked like a charm.
by JohnTHaller
1 subcomments
- Series of Chinese LLM scrapers kept PortableApps.com running slow and occasionally unresponsive for 2 weeks.
- There are plenty of local LLMs out there run by humans that play nice. It's not the LLMs that are the problem. It's the corporations. That's the commonality. Human people aren't doing this. These corporate legal persons are a much more dangerous and capable form of non-human intelligence with non-human motives than LLMs (which are not doing the scraping or even calling the tools which are sending the HTTP requests). And they have lobbied their way to legal immunity to most of their crimes.
- > Someone really ought to do something about it.
What is bro proposing here?