Oddbean new post about | logout
 nostr:npub1rysx3lwfv2d7x9c43l4gh0skvg4m70eekd2v47zvx89vafulem0qav5m9t 

> I'm getting my Nitter instanced scraped by a botnet that appears to be 100k IP large, they get fed in as fast as I ban them, but I don't believe they assign more than 100 IP to scraping at a time as to not DDOS the site,

WEIRD

> one IP never doing a scrape under 7 seconds so rate limiting wont nab them.

Out of curiosity, what UAs are they using?  Tried SSL fingerprinting?  You know why they'd be hitting your server, like did you check if DiscordBot or something is in your referrers, or someone linked to it from somewhere, or...?

> something I read about 10+ years ago, a sticky trap. I want to ensnare the bot into a perpetually open http request so that it never completes its loop,

Ah, okay, so you can do this pretty easily with nginx:  you can forward to different backends conditionally with one of the (really badly documented) `if` directives.  Set up a little script, listen/accept on one end, and then make a connection on the other to the actual upstream.  So if your Nitter instance is running on localhost:4444, you have this script listen on localhost:4445.  Have it relay all of the traffic upstream and then get the entire response (to avoid jamming up the real server), but trickle the response a few bytes at a time.  Some clients time out if you take too long to get the headers to them, so maybe send the headers back faster, but like delay a couple of seconds, then send the headers, then trickle the rest at a few bytes per second.

Another way to do this is to use iptables.  True story:  `-m statistic --mode random --probability 0.5 -j DROP` does more or less what you would expect.  This is what I did when Pawoo was flooding FSE with massive numbers of deletes, like as a kind of dopey rate-limiting ability.  (Unintentional on their part:  a few accounts with really long post history deleted themselves, and this causes Mastodon to send one delete per activity since the beginning of time to every server it has heard of...except the ones that it has blocked.)

> I figure that it the botnet notices when its banned and starts getting 403'd,

Basically zero of the scrapers that hit FSE do this.  Boardreader.com didn't even notice when I started actively poisoning their data until about a week after I started including the phone number of the guy that was ignoring my emails in the data.