Mastodon Feed: Post

Mastodon FeedSep 21, 2021, 3:02 PM

Reblogged by technomancy@icosahedron.website ("tech? no! man, see..."):
aparrish@friend.camp ("allison") wrote:

the delicious irony: creators of industrial language models are now worried about no longer being able to use the web as their "commons" (i.e. other people's labor that they appropriate and commercialize) because their own outputs are "polluting" it (via https://mailchi.mp/jack-clark/import-ai-266-deepmind-looks-at-toxic-language-models-how-translation-systems-can-pollute-the-internet-why-ai-can-make-local-councils-better)

Attachments:

One big problem: Today, we're in the era of text-generating and translation AI systems being deployed. But there's a big potential problem - the outputs of these systems may ultimately damage our ability to train AI systems. This is equivalent to environmental collapse - a load of private actors are taking actions which generate a short-term benefit but in the long-term impoverish and toxify the commons we all use. Uhb oh!. "Our empirical findings also raise concerns regarding the effect of synthetic data on model scaling and evaluation, and how proliferation of machine generated text might hamper the quality of future models trained on web-text." (remote)