

Yes they were, so I’m offering you an actual theory as to why this may actually be true, yet difficult to “prove”.
Smoking was bad for your health long before anyone sat down and took the time to prove it. Autoregressive LLM tokenizer are a very new field of computer science and it’s going to take a while for the community to collectively understand everything we’re currently doing by trial and error.
It’s almost certainly related to cloud-init, (the canonical tool for handling deployment automation) or Ubuntu pro (extra long support for backporting security packages to older distros, plus some conveniences). They’re pre installed as a convenience to paid users of those services, that’s the (IMHO, quite reasonable) model they use to fund the distro. I would expect that some or all of that traffic would disappear if you disable/remove those two services.
https://cloud-init.io/
https://ubuntu.com/pro