Oddbean new post about | logout
 Amazon bots DOSing me by downloading all of the content in all of my public repositories. At least my rate limiting is keeping that under control. I guess I should probably blacklist their IPs 
 maybe you want to require auth on your repos 
 Turns out its Claudbot 
 block che cancerbot 
 Also kind of a bummer, it should be easy for people to download my stuff.  It's completely manageable right now, but AI is clearly scraping my repos 
 AI is such cancer... i'd be blacklisting cloud service IP ranges altogether personally. Anyone who is running stuff from that in containers or whatever probably isn't nice people. actual users who like nostr and stuff will not be using google cancer or amazon cancer services 
 Your right, but I use my own private VPN from a VPS for my outgoing traffic and I get blocked from so many things now. I couldn't use YouTube, Spotify, Facebook etc, if I wanted to anymore.  
 yeah, i get small problems, mostly just captcha from cloudflare but my VPS seems to not be in the shitlist 
 probably because it's kinda expensive but it's quality infra, Sofia, Bulgaria, which had a huge influx of remote support service companies and an existing extensive high speed ethernet network (often strung across between buildings by gangsters back in the day, selling access to pirate movie caches), but it is on the high side of expensive

still, i have 500mbit down now and i get all of that via the tunnel so i no complain 
 "is it a DoS or a CI runner" 😅 

often projects don't bother setting up caching and just re-download and check out the dependencies constantly

but yes blacklisting or just setting their rate limit to very very slow will send the right message, either way 
 If they weren't so clear about their crawling. I also just made repo urls public on my website Friday, so likely why crawlers picked it up. Its just been a steady stream of outgoing data XD

User Agent says "Amazbot" and "ClaudeBot/1.0; +claudebot@anthropic.com"

It's just this recursive path spam. I turned logging on briefly. Mind you there are no "icon" directories in these repos. 

Example: 
/cgit/vnuge/vnlib-core.git/tree/lib/Utils/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/icons/ubuntu-logo.png HTTP/1.1" 200 3121 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)" 
 This is stuff you host from home?

I am getting fiber tomorrow and trying to figure out the pitfalls of self hosting any thing. I am guessing if I make any DNS records that point to my home IP I am going to get hammered. 
 Yessir from my main wan traffic. 

I've been hosting stuff (like my website) publicly since 2010, and maybe 1 or 2 times have I had any actually major DOS issues. This is far from major, I have many resource exhaustion protections in place. 

Also, I do not recommend pointing DNS directly to your home public IP. I pay for a public VPS and use nginx stream proxying to tunnel IP traffic back home. 1 for a layer of privacy, 2 for isolation, 3 so I don't have to terminate SSL until it hits my network, so my certs are only stored locally. Also in the case of DOS events I can just log into the VPS to disable routing, and I get my internet back. If I ever lose my VPS I can possibly purchase from another company and copy/paste my nginx config and be back up hopefully within a few hours if I need it.  
 i use wireguard tunnels and my own bespoke reverse proxy... and it lets me test my stuff live on the internet from my dev box 
 Yeah, this is what I was thinking of doing. Probably with wire guard. I haven't ever used nginx though. How much vps do you need to route a gigabit? Do you do filtering at your vps? Packet inspection? 
 Nginx is a fantastic tool! I have 2TB/month of traffic for my VPS and I don't come anywhere near hitting that. No, my VPS is a dumb TCP forwarder that's all it does. I just have some IP based limits, that's all.  
 Very likely could be. I'm seeing the same requests from both the Amazonbot and ClaudBot. 

I only allow 8kb header lines, if the request line is over 8k the connection is terminated. 

https://git.vaughnnugent.com/cgit/vnuge/vnlib-core.git/tree/lib/Net.Http/src/Core/RequestParse/Http11ParseExtensions.cs#n102 
 Yeah after blacklisting those user agents, I'm seeing Twitterbot. So its likely not legit 
 Now it's sending random hex encoded binary payloads in the query string. Weird. I've been doing this for a long time and haven't seen this type of traffic lol 
 Kind of looks like telnet traffic as it's not correctly formatted http 
 they won't give up won't they 🤣 
is it really random or something like shellcode maybe? 
 Some example. I haven't tried to decode it. Cyberchef to the rescue!

147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "HELP" 400 150 "-" "-"
147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "\x1B\x84\xD5\xB0]\xF4\xC4\x93\xC50\xC2X\x8C\xDA\xB1\xD7\xAC\xAFn\x1D\xE1\x1E\x1A3*\x85\xB7\x1D'\xB1\xC9k\xBF\xF0\xBC" 400 150 "-" "-"
147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "batman" 400 150 "-" "-"

147.182.162.162 = prod-boron-nyc1-32.do.binaryedge.ninja 
 Some of it might be disguised, regex? I see a * and a ' in there. It doesn't decode to anything readable as far as I can tell.  
 right ! it does look like it has some structure to it, and isn't completely random, but as assembly it's complete nonsense

$ rasm2 -a x86 -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
sbb eax, dword [rbp + rdx*8 - 0x3b0ba250]
xchg ebx, eax
vcmpps xmm11, xmm9, xmmword [rax - 0x74], 0xda
mov cl, 0xd7
lodsb al, byte [rsi]
...

$ rasm2 -a arm -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
adrp x27, 0xffffffffab081000
extr x29, x2, x4, 0x3d
ldr x5, 0xfffffffffff84620
invalid
 
 googling for that IP shows it is in some blacklists, you're at least not the only target 🙂  
 I appreciate the help :) 
 Kind of looks like telnet traffic as it's not correctly formatted http 
 they won't give up won't they 🤣 
is it really random or something like shellcode maybe? 
 Some example. I haven't tried to decode it. Cyberchef to the rescue!

147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "HELP" 400 150 "-" "-"
147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "\x1B\x84\xD5\xB0]\xF4\xC4\x93\xC50\xC2X\x8C\xDA\xB1\xD7\xAC\xAFn\x1D\xE1\x1E\x1A3*\x85\xB7\x1D'\xB1\xC9k\xBF\xF0\xBC" 400 150 "-" "-"
147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "batman" 400 150 "-" "-"

147.182.162.162 = prod-boron-nyc1-32.do.binaryedge.ninja 
 Some of it might be disguised, regex? I see a * and a ' in there. It doesn't decode to anything readable as far as I can tell.  
 right ! it does look like it has some structure to it, and isn't completely random, but as assembly it's complete nonsense

$ rasm2 -a x86 -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
sbb eax, dword [rbp + rdx*8 - 0x3b0ba250]
xchg ebx, eax
vcmpps xmm11, xmm9, xmmword [rax - 0x74], 0xda
mov cl, 0xd7
lodsb al, byte [rsi]
...

$ rasm2 -a arm -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
adrp x27, 0xffffffffab081000
extr x29, x2, x4, 0x3d
ldr x5, 0xfffffffffff84620
invalid
 
 googling for that IP shows it is in some blacklists, you're at least not the only target 🙂  
 I appreciate the help :) 
 Some example. I haven't tried to decode it. Cyberchef to the rescue!

147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "HELP" 400 150 "-" "-"
147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "\x1B\x84\xD5\xB0]\xF4\xC4\x93\xC50\xC2X\x8C\xDA\xB1\xD7\xAC\xAFn\x1D\xE1\x1E\x1A3*\x85\xB7\x1D'\xB1\xC9k\xBF\xF0\xBC" 400 150 "-" "-"
147.182.162.162 - - [17/Nov/2024:19:52:30 +0000] "batman" 400 150 "-" "-"

147.182.162.162 = prod-boron-nyc1-32.do.binaryedge.ninja 
 Some of it might be disguised, regex? I see a * and a ' in there. It doesn't decode to anything readable as far as I can tell.  
 right ! it does look like it has some structure to it, and isn't completely random, but as assembly it's complete nonsense

$ rasm2 -a x86 -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
sbb eax, dword [rbp + rdx*8 - 0x3b0ba250]
xchg ebx, eax
vcmpps xmm11, xmm9, xmmword [rax - 0x74], 0xda
mov cl, 0xd7
lodsb al, byte [rsi]
...

$ rasm2 -a arm -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
adrp x27, 0xffffffffab081000
extr x29, x2, x4, 0x3d
ldr x5, 0xfffffffffff84620
invalid
 
 googling for that IP shows it is in some blacklists, you're at least not the only target 🙂  
 I appreciate the help :) 
 Some of it might be disguised, regex? I see a * and a ' in there. It doesn't decode to anything readable as far as I can tell.  
 right ! it does look like it has some structure to it, and isn't completely random, but as assembly it's complete nonsense

$ rasm2 -a x86 -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
sbb eax, dword [rbp + rdx*8 - 0x3b0ba250]
xchg ebx, eax
vcmpps xmm11, xmm9, xmmword [rax - 0x74], 0xda
mov cl, 0xd7
lodsb al, byte [rsi]
...

$ rasm2 -a arm -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
adrp x27, 0xffffffffab081000
extr x29, x2, x4, 0x3d
ldr x5, 0xfffffffffff84620
invalid
 
 googling for that IP shows it is in some blacklists, you're at least not the only target 🙂  
 I appreciate the help :) 
 right ! it does look like it has some structure to it, and isn't completely random, but as assembly it's complete nonsense

$ rasm2 -a x86 -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
sbb eax, dword [rbp + rdx*8 - 0x3b0ba250]
xchg ebx, eax
vcmpps xmm11, xmm9, xmmword [rax - 0x74], 0xda
mov cl, 0xd7
lodsb al, byte [rsi]
...

$ rasm2 -a arm -b 64 -d  '1b84d5b05df4c493c530c2588cdab1d7acaf6e1de11e1a332a85b71db1c96bbff0bc'
adrp x27, 0xffffffffab081000
extr x29, x2, x4, 0x3d
ldr x5, 0xfffffffffff84620
invalid
 
 googling for that IP shows it is in some blacklists, you're at least not the only target 🙂  
 I appreciate the help :) 
 I appreciate the help :)