• 0 Posts
  • 7 Comments
Joined 1M ago
cake
Cake day: Nov 21, 2025

help-circle
rss

That is enormously ironic, since I literally never claimed you said anything except for what you did: Namely, that synthetic data is enough to train models.

According to you, they should be able to just generate synthetic training data purely with the previous model, and then use that to train the next generation.

LIterally, the very next sentence starts with the words “Then why”, which clearly and explicitly means I’m no longer indirectly quoting you Everything else in my comment is quite explicitly my own thoughts on the matter, and why I disagree with that statment, so in actual fact, you’re the one making up shit I never said.


If the model collapse theory weren’t true, then why do LLMs need to scrape so much data from the internet for training ?

According to you, they should be able to just generate synthetic training data purely with the previous model, and then use that to train the next generation.

So why is there even a need for human input at all then ? Why are all LLM companies fighting tooth and nail against their data scraping being restricted, if real human data is in fact so unnecessary for model training, and they could just generate their own synthetic training data instead ?

You can stop models from deteriorating without new data, and you can even train them with synthetic data, but that still requires the synthetic data to either be modelled, or filtered by humans to ensure its quality. If you just take a million random chatGPT outputs, with no human filtering whatsoever, and use those to retrain the chatGPT model, and then repeat that over and over again, eventually the model will turn to shit. Each iteration some of the random tweaks chatGPT makes to their output are going to produce some low quality outputs, which are now presented to the new training model as a target to achieve, so the new model learns that the quality of this type of bad output is actually higher, which makes it more likely for it to reappear in the next set of synthetic data.

And if you turn of the random tweaks, the model may not deteriorate, but it also won’t improve, because effectively no new data is being generated.


The line, imo, is: are you creating it yourself, and just using AI to help you make it faster/more convenient, or is AI the primary thing that is creating your content in the first place.

Using AI for convenience is absolutely valid imo, I routinely use chatGPT to do things like debugging code I wrote, or rewriting data sets in different formats, instead of doing to by hand, or using it for more complex search and replace jobs, if I can’t be fucked to figure out a regex to cover it.

For these kind of jobs, I think AI is a great tool.

More simply said, I personally generally use AI for small subtasks that I am entirely capable of doing myself, but are annoying/boring/repetitive/time consuming to do by hand.


If “everyone will be using AI”, AI will turn to shit.

They can’t create originality, they’re only recycling and recontextualising existing information. But if you recycle and recontextualise the same information over and over again, it keeps degrading more and more.

It’s ironic that the very people who advocate for AI everywhere, fail to realise just how dependent the quality of AI content is on having real, human generated content to input to train the model.


Reminds of the old pile of gold (empty) meme


I mean that is kinda exactly what the developers want to provoke with timed dialogue choices. Timed dialogue choices are a game design mechanic to try and get a player to answer on instinct/gut feeling, rather than over analysing and trying to optimise the dialogue.

You not getting to think about it long is very much the intended effect, and allowing a pause would entirely defeat it.

There are of course definite accessibility concerns that should be considered and worked around, such as people with dyslexia who may not be able to properly parse the dialogue options before the timer runs out, but as a game mechanic I think forcing the player to pick on instinct definitely has merit. It helps make the game more immersive, because it puts you under the same pressure to react as your character is in the story right now, and it can lead to more interesting and ultimately enjoyable games by forcing players to potentially make a mistake, and having to find out a way to deal with the fallout.


Games that don’t allow you to pause and skip cutscenes.

I don’t want to have to miss half of the cutscenes just because someone interrupted me or the phone rang or something half way through. Alternatively, when I’m on my 23rd replay of a game, I do not want to have to sit through every cutscenes I already know by heart.

Oh, and modern games that allow manual saving at any time, not having any kind of regular auto save (looking at you here BG3).

If you’re fine from a gameplay pov with having the player save whenever, then there’s really no good reason whatsoever to not have one or two auto save slots that get saved every 10-20 minutes or so, at least as an option in the menu. ESPECIALLY in open world games (like BG3…) where you can easily go literal hours at a time without hitting a checkpoint save. And yes, I am still salty over learning about BG3’s lack of regular auto save when I lost like 2.5 hours of progress on my first run.