And what more would that be?
And what more would that be?
It’s mentioned as one of the entities Mozilla will pressure for change, and it is listed in the article on 404 media that Mozilla cites.
scraped public data from different platforms can be stitched together to infer identity, location, status, beliefs, and networks — even if no one website reveals all that alone.
If they don’t use the data from the fediverse, it’s only because it’s too small to bother. These platforms are easier to scrape than any proprietary service.
Article: Billions of scraped Discord messages up for sale
Any server that has had its invite link posted online is guaranteed to be in the pockets of multiple such scrapers. If you’re talking about private servers with a handful or a few dozen members… yeah, sure…
You and the previous2 poster who complained about people complaining about AI slop should have a rap battle.
I think it’s separate, assigned automatically by scanning the subtitles and imagery, probably.
“you get a warning and need to click past it once” is probably the right set of guardrails there
You also have to sign in…
IME the wayback machine might give you the post body. I tend to need the comments as well, and archive.today can save all that properly. It usually isn’t saved already, so you’ll have to wait a minute or two. I just use this time to look for other sources.
(RE: storage costs: it is probably easier to throw a couple extra bucks at your admins (if possible), and that should more than cover the costs for a good number of images.)
The previous up to X words (tokens) go in, the next word (token) comes out. Where is this"world-model" that it “maintains”?