Forum:Announcements/Hyper-aggressive AI LLM crawlers

From OpenGeofiction
ForumsOfficial announcements → Announcements/Hyper-aggressive AI LLM crawlers

The main OGF server and wiki server have both been under increasing load due to hyper-aggressive AI LLM crawlers. This impacts site performance, and particularly on the wiki server has resulted in severe outages. Adding additional server resource only offers a partial respite, as the requests then increase yet again.

These disruptive requests are extremely hard to mitigate. They ignore robots.txt instructions, operate from an immense range of IP addresses and use UserAgent strings which randomise across valid agents used by real users.

As a mitigation we are experimenting with HTTP Authentication. If you see such a request then enter ogf as the username and opengeofiction as the password. This may be implemented sporadically.

Thanks/wangi (talk) 11:27, 7 June 2025 (UTC)

Mitigations have been put in place - on the wiki server - to block most of the unwanted requests without impacting performance for real users. However if you do see a "403 Forbidden" response while browsing the site, then please share information here on what you doing so it can be investigated. Thanks/wangi (talk) 11:16, 9 June 2025 (UTC)

The LLM crawlers are continuing to impact OGF, and in particular this is currently worst on the API server. This is the reason for API connects from JOSM timing out, sluggish uploads and also why you will see red time-out errors on some wiki pages. /wangi (talk) 15:24, 17 June 2025 (UTC)

Further mitigations are now in place on the api server, and these have brought the situation under control. /wangi (talk) 11:17, 18 June 2025 (UTC)
It's frustrating that these crawlers are attempting to download every signal node, way, relation and changeset one by one... We have a very efficient way to download them all at once - backups - and then reuse per our terms OpenGeofiction:Contributor Terms#Copyright terms. /wangi (talk) 12:12, 18 June 2025 (UTC)