Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges.

Pro@programming.dev · 8 days ago

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges.

nialv7@lemmy.world · edit-2 6 days ago

We had a trust based system for so long. No one is forced to honor robots.txt, but most big players did. Almost restores my faith in humanity a little bit. And then AI companies came and destroyed everything. This is why we can’t have nice things.

katy ✨@piefed.blahaj.zone · 7 days ago

reminder to donate to codeberg and forgejo :)

mfed1122@discuss.tchncs.de · edit-2 7 days ago

Okay what about…what about uhhh… Static site builders that render the whole page out as an image map, making it visible for humans but useless for crawlers 🤔🤔🤔

lapping6596@lemmy.world · 7 days ago

Accessibility gets throw out the window?

mfed1122@discuss.tchncs.de · 7 days ago

I wasn’t being totally serious, but also, I do think that while accessibility concerns come from a good place, there is some practical limitation that must be accepted when building fringe and counter-cultural things. Like, my hidden rebel base can’t have a wheelchair accessible ramp at the entrance, because then my base isn’t hidden anymore. It sucks that some solutions can’t work for everyone, but if we just throw them out because it won’t work for 5% of people, we end up with nothing. I’d rather have a solution that works for 95% of people than no solution at all. I’m not saying that people who use screen readers are second-class citizens. If crawlers were vision-based then I might suggest matching text to background colors so that only screen readers work to understand the site. Because something that works for 5% of people is also better than no solution at all. We need to tolerate having imperfect first attempts and understand that more sophisticated infrastructure comes later.

But yes my image map idea is pretty much a joke nonetheless

Monument@lemmy.sdf.org · 7 days ago

Increasingly, I’m reminded of this: Paul Bunyan vs. the spam bot (or how Paul Bunyan triggered the singularity to win a bet). It’s a medium-length read from the old internet, but fun.

prole@lemmy.blahaj.zone · 7 days ago

Tech bros just actively making the internet worse for everyone.

ShaggySnacks@lemmy.myserv.one · 7 days ago

Tech bros just actively making ~~the internet~~ society worse for everyone.

FTFY.

SufferingSteve@feddit.nu · edit-2 8 days ago

There once was a dream of the semantic web, also known as web2. The semantic web could have enabled easy to ingest information of webpages, removing soo much of the computation required to get the information. Thus preventing much of the AI crawling cpu overhead.

What we got as web2 instead was social media. Destroying facts and making people depressed at a newer before seen rate.

Web3 was about enabling us to securely transfer value between people digitally and without middlemen.

What crypto gave us was fraud, expensive jpgs and scams. The term web is now even so eroded that it has lost much of its meaning. The information age gave way for the misinformation age, where everything is fake.

Marshezezz@lemmy.blahaj.zone · 8 days ago

Capitalism is grand, innit. Wait, not grand, I meant to say cancer

Serinus@lemmy.world · 7 days ago

I feel like half of the blame capitalism gets is valid, but the other half is just society. I don’t care what kind of system you’re under, you’re going to have to deal with other people.

Oh, and if you try the system where you don’t have to deal with people, that just means other people end up handling you.

kazerniel@lemmy.world · 7 days ago

It matters a lot though what kind of goal the system incentivises. Imagine if it was people’s happiness and freedom instead of quarterly profits.

Amju Wolf@pawb.social · 7 days ago

In this case it is purely fault of the money incentive though. Noone would spend so much effort and computation power on AI if they didn’t think it could make them money.

The funniest part is though that it’s only theoretical anyway, everyone is only losing on it and they’re most likely never gonna make it back.

muusemuuse@sh.itjust.works · 7 days ago

Sound like it went the same way everything else went. The less money is involved the more trustworthy it is.

zbyte64@awful.systems · 7 days ago

Is there nightshade but for text and code? Maybe my source headers should include a bunch of special characters that then give a prompt injection. And sprinkle some nonsensical code comments before the real code comment.

zifk@sh.itjust.works · 8 days ago

Anubis isn’t supposed to be hard to avoid, but expensive to avoid. Not really surprised that a big company might be willing to throw a bunch of cash at it.

oeuf@slrpnk.net · 7 days ago

Crazy. DDoS attacks are illegal here in the UK.

BlameTheAntifa@lemmy.world · 7 days ago

The problem is that hundreds of bad actors doing the same thing independently of one another means it does not qualify as a DDoS attack. Maybe it’s time we start legally restricting bots and crawlers, though.

rdri@lemmy.world · 7 days ago

So, sue the attackers?

Spaz@lemmy.world · edit-2 6 days ago

Is there a migration tool? If not would be awesome to migrate everything including issues and stuff. Bet even more people would move.

BlameTheAntifa@lemmy.world · 7 days ago

Codeberg has very good migration tools built in. You need to do one repo at a time, but it can move issues, releases, and everything.

PhilipTheBucket@piefed.social · 8 days ago

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you’ve actively evading Anubis, fuckin’ game on.

The Infinite Nematode@feddit.uk · 8 days ago

Wasn’t this called black ice in Neuromancer? Security systems that actively tried to harm the hacker?

TurboWafflz@lemmy.world · 8 days ago

I think the best thing to do is to not block them when they’re detected but poison them instead. Feed them tons of text generated by tiny old language models, it’s harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn’t slow down real users, but you probably don’t need much power since the scrapers probably don’t really care about the speed

31ank@ani.social · edit-2 8 days ago

Some guy also used zip bombs against AI crawlers, don’t know if it still works. Link to the lemmy post

xthexder@l.sw0.com · 8 days ago

I love catching bots in tarpits, it’s actually quite fun