It is harder to apply a watermark to text than to images, because word choice is essentially the only variable that can be altered. DeepMind’s watermark — called SynthID-Text — alters which words the model selects in a secret, but formulaic way that can be detected with a cryptographic key. Compared with other approaches, DeepMind’s watermark is marginally easier to detect, and applying it does not slow down text generation. “It seems to outperform schemes of the competitors for watermarking LLMs,” says Shumaylov, who is a former collaborator and brother of one of the study’s authors.
The article author notes the danger of AI fake data and briefly notes that AI training on AI source is a real problem. I expect the latter is the real reason for this.