Jump to content

Forum:Announcements/Hyper-aggressive AI LLM crawlers

From OpenGeofiction
ForumsOfficial announcements → Announcements/Hyper-aggressive AI LLM crawlers

The main OGF server and wiki server have both been under increasing load due to hyper-aggressive AI LLM crawlers. This impacts site performance, and particularly on the wiki server has resulted in severe outages. Adding additional server resource only offers a partial respite, as the requests then increase yet again.

These disruptive requests are extremely hard to mitigate. They ignore robots.txt instructions, operate from an immense range of IP addresses and use UserAgent strings which randomise across valid agents used by real users.

As a mitigation we are experimenting with HTTP Authentication. If you see such a request then enter ogf as the username and opengeofiction as the password. This may be implemented sporadically.

Thanks/wangi (talk) 11:27, 7 June 2025 (UTC)

Mitigations have been put in place - on the wiki server - to block most of the unwanted requests without impacting performance for real users. However if you do see a "403 Forbidden" response while browsing the site, then please share information here on what you doing so it can be investigated. Thanks/wangi (talk) 11:16, 9 June 2025 (UTC)

The LLM crawlers are continuing to impact OGF, and in particular this is currently worst on the API server. This is the reason for API connects from JOSM timing out, sluggish uploads and also why you will see red time-out errors on some wiki pages. /wangi (talk) 15:24, 17 June 2025 (UTC)

Further mitigations are now in place on the api server, and these have brought the situation under control. /wangi (talk) 11:17, 18 June 2025 (UTC)
It's frustrating that these crawlers are attempting to download every signal node, way, relation and changeset one by one... We have a very efficient way to download them all at once - backups - and then reuse per our terms OpenGeofiction:Contributor Terms#Copyright terms. /wangi (talk) 12:12, 18 June 2025 (UTC)

The wiki site has been under extreme load the last few days due to bot activity, a number of network ranges have been blocked in an attempt to mitigate. /wangi (talk) 15:51, 18 October 2025 (UTC)

Yet more sporadic heavy bot activity hitting the wiki server - not able to put any mitigation in place right now, but it is behaving better following the server update. /wangi (talk) 14:13, 16 February 2026 (UTC)
Targetted blocking, along with hopefully better resource exhaustion fallback, have now been implemented. On to monitoring. /wangi (talk) 23:37, 16 February 2026 (UTC)
I've further tweaked the filters, so links within the Atom feeds (e.g. as also used by Discord) now work again. /wangi (talk) 21:45, 18 February 2026 (UTC)

Hello wangi! I don't know if I'm allowed to write here. If not, feel free to delete this comment. However, blocking activity from pretty much every external server are also impacting users who sometimes need to access certain relations, nodes, and ways in the form of links (e.g., https://opengeofiction.net/relation/471241) from external sites, such as Google Sheets or Discord, are blocked. I know it's probably unreasonable to whitelist certain external sites as it opens up a whole can of worms. Is it possible to host some kind of node/way/relation lookup system on the opengeofiction.net domain by ID (471241), or better yet, allow users to look IDs up by using the search feature on the opengeofiction site? Cheers, ParrotMan (talk) 13:03, 23 March 2026 (UTC)

Hi, problem is that these other sites - let's say Discord and Google Drive specifically - do not play nice, they do not send referrer headers. If they did, then it would be simple to tweak behaviour, however without them it's impossible. The use of the various map link templates (e.g. {{relation}}, {{way}}, {{node}}) on the wiki will always give working links, as should linking through from the map, Overpass or Spyglass servers. Likewise your choice of browser is important - the bots masquerade as the most popular, so using something less common (Waterfox, Firefox etc) also works. /wangi (talk) 14:50, 23 March 2026 (UTC)
Hi again, thanks for the reply! The issue that we're facing is that there isn't a shorthand way to quickly share a certain node/relation/way for personal and sharing purposes to other users. Most of the time we wouldn't want to be constantly editing a wiki page in order to share information. If I provide a simple html file that allows a user to input an ID for a node/way/relation to look it up on OGF, would you guys be willing to host it? Or, perhaps I make my own solution elseware but provide a referer header? Best, ParrotMan (talk) 15:47, 23 March 2026 (UTC)
The web server rules have been reworked, this should reduce false positives. Let me know how it goes. /wangi (talk) 21:53, 23 March 2026 (UTC)
Hello! Links work now, thanks for working on the issue! Cheers 🥂, ParrotMan (talk) 21:59, 23 March 2026 (UTC)