For fucks sake, there are researchers behind the training data, if the models were really getting worse they would figure something out and do some housecleaning.And it's just four of the many papers just for this particular question.
Personally I think that there's just no clear answer yet, because all depend on the decisions that will be made in the future. Actually people are just playing with AIs. And when I say "people" I don't limits to their users, but also those who develop them.
It's obviously a serious work, and they take it really seriously, but the technology is so complex and in regard of its long past (more than 60 years of research), it's only yesterday that it became something reliable enough. So, even those who works on it are playing. They try a way to code them, then wait, look at the result, and try to change this or that. And they do the same for all aspect, including the methods used to train them.
Just because they are stealing everything that isn't nailed down on the internet doesn't mean they are stupid.
Also there is a great misunderstanding on what "Synthetic Data" is, if you Render something on Daz that is "Synthetic Data", and if they don't screw up the material shaders and lighting, that is already Physically Based Rendering that is supposed to be fairly close to reality.
They ultimately need 3D Scene Generation for the Robots to understand their environment and have a Sandbox to Simulate things in, so more Synthetic Data is inevitable once they figure out 3D Scene Generation, they just render that and be done. For some reason people forget that Rendering is a thing.
Why do you think we keep jamming more and more Data in them, and why the AI bros are so obsessed how Large theirAnd it's this last part that is crucial, because the same AI guess if it's a rose or not and validate if it guessed right or not. So, if there's a particularity in each images (like a 90° ruler) before this step, it can perfectly have made the wrong assertion regarding what is a rose, and enforce that error more and more and more.
What is also the problem with AIs being trained with AI generated content. As I said previously, it just reinforce their bias. Except that like it's others AIs that generated that content, it also spread that bias between all AIs.
We have gone a bit beyond the AI understanding what a "Rose" is.
The "Concepts" the AI is learning right now is much more Abstract, including a rudimentary form of System 1 Reasoning.
Remember System 1 Thinking is entierly instinctual and based on the prior experience and patterns, exactly what you get from jamming a large amount of Data will get you.
You must be registered to see the links