A Valve artist has defended AI disclosures on storefronts like Steam, saying they only scare those with "low effort" products.
@[email protected]
link
fedilink
English
0
edit-2
18d

If the model collapse theory weren’t true, then why do LLMs need to scrape so much data from the internet for training ?

According to you, they should be able to just generate synthetic training data purely with the previous model, and then use that to train the next generation.

So why is there even a need for human input at all then ? Why are all LLM companies fighting tooth and nail against their data scraping being restricted, if real human data is in fact so unnecessary for model training, and they could just generate their own synthetic training data instead ?

You can stop models from deteriorating without new data, and you can even train them with synthetic data, but that still requires the synthetic data to either be modelled, or filtered by humans to ensure its quality. If you just take a million random chatGPT outputs, with no human filtering whatsoever, and use those to retrain the chatGPT model, and then repeat that over and over again, eventually the model will turn to shit. Each iteration some of the random tweaks chatGPT makes to their output are going to produce some low quality outputs, which are now presented to the new training model as a target to achieve, so the new model learns that the quality of this type of bad output is actually higher, which makes it more likely for it to reappear in the next set of synthetic data.

And if you turn of the random tweaks, the model may not deteriorate, but it also won’t improve, because effectively no new data is being generated.

@[email protected]
link
fedilink
English
018d

I stopped reading when you said according to me and then produced a wall of text of shit I never said.

Synthetic data is massively helpful. You can look it up. This is a myth.

@[email protected]
link
fedilink
English
-1
edit-2
18d

That is enormously ironic, since I literally never claimed you said anything except for what you did: Namely, that synthetic data is enough to train models.

According to you, they should be able to just generate synthetic training data purely with the previous model, and then use that to train the next generation.

LIterally, the very next sentence starts with the words “Then why”, which clearly and explicitly means I’m no longer indirectly quoting you Everything else in my comment is quite explicitly my own thoughts on the matter, and why I disagree with that statment, so in actual fact, you’re the one making up shit I never said.

Create a post

Welcome to the largest gaming community on Lemmy! Discussion for all kinds of games. Video games, tabletop games, card games etc.

Rules

1. Submissions have to be related to games

Video games, tabletop, or otherwise. Posts not related to games will be deleted.

This community is focused on games, of all kinds. Any news item or discussion should be related to gaming in some way.

2. No bigotry or harassment, be civil

No bigotry, hardline stance. Try not to get too heated when entering into a discussion or debate.

We are here to talk and discuss about one of our passions, not fight or be exposed to hate. Posts or responses that are hateful will be deleted to keep the atmosphere good. If repeatedly violated, not only will the comment be deleted but a ban will be handed out as well. We judge each case individually.

3. No excessive self-promotion

Try to keep it to 10% self-promotion / 90% other stuff in your post history.

This is to prevent people from posting for the sole purpose of promoting their own website or social media account.

4. Stay on-topic; no memes, funny videos, giveaways, reposts, or low-effort posts

This community is mostly for discussion and news. Remember to search for the thing you’re submitting before posting to see if it’s already been posted.

We want to keep the quality of posts high. Therefore, memes, funny videos, low-effort posts and reposts are not allowed. We prohibit giveaways because we cannot be sure that the person holding the giveaway will actually do what they promise.

5. Mark Spoilers and NSFW

Make sure to mark your stuff or it may be removed.

No one wants to be spoiled. Therefore, always mark spoilers. Similarly mark NSFW, in case anyone is browsing in a public space or at work.

6. No linking to piracy

Don’t share it here, there are other places to find it. Discussion of piracy is fine.

We don’t want us moderators or the admins of lemmy.world to get in trouble for linking to piracy. Therefore, any link to piracy will be removed. Discussion of it is of course allowed.

Authorized Regular Threads

Related communities

PM a mod to add your own

Video games

Generic

Help and suggestions

By platform
By type
By games
Language specific
  • 1 user online
  • 134 users / day
  • 674 users / week
  • 1.94K users / month
  • 6.22K users / 6 months
  • 1 subscriber
  • 7.99K Posts
  • 166K Comments
  • Modlog