- Jun 3, 2018
- 97
- 103
Came here for the technical details, stayed for the legal debate.
A person set in motion the collection of this data, where the data itself was posted publicly. In many cases, that person would claim that they were pulling a public resource using software, which is the same thing a person using a browser is technically doing. The internet archive operates similarly.
A person then took that data and fed it into a system to train a diffusion model by adding noise to images with text in order to learn how to reverse from noise to an image that text also represents.
We can then take the AI model generated from the learning process, to convert pure noise into an image that matches the descriptors. A similar analogy would be to give an artist a look at a marble statue, and then give them a chunk of marble and tell them to sculpt something that matches the same description. They aren't told to duplicate it, just achieve the same goals for their representation, and the 'noise' in that medium will similarly change the result.
- The diffusion models also don't contain the images either, only a model that can be used to generate similar images, and since there are limitations to copyright of a style, that's probably not be actionable either. A case will eventually come up, but lawyers will wait to pick and example of this fight that they are certain they will win, because they won't want to set a precedent with a loss.
- The source data itself, like LAION, probably has the most risk, but they've set themselves up to pass that along as well, as they can point to the public URL.
- If the original URL owner had rights to post the image publicly, the discussion ends. If not, they get a takedown notice, and there's some uncertainty on how the rights get clawed back from repositories that archive data sourced from public URLs.
I won't be surprised though if the path forward on the ethics side is going to be based on generation of targeted individual likenesses (as people have rights to their own likeness). The messier side will be artistic styles, this process is as unlikely to directly replicate a copyrighted work as my cat is to leave a version of starry night in the litterbox.
Now time to go rewatch "Everything Is a Remix" for the ?th time, and maybe the explanation of the Amen break.
This is essentially already answered, but humans always kickoff each step of the work end to end. The current datasets discussed are scraped with some automation, and each data set you may use for training has its own sourcing method. The main example discussed is the LAION-5B which is just a set of images, URLs, and text.Is the AI controlling the scrapper-like, or are humans starting it, then providing the images to the AI ?
A person set in motion the collection of this data, where the data itself was posted publicly. In many cases, that person would claim that they were pulling a public resource using software, which is the same thing a person using a browser is technically doing. The internet archive operates similarly.
A person then took that data and fed it into a system to train a diffusion model by adding noise to images with text in order to learn how to reverse from noise to an image that text also represents.
We can then take the AI model generated from the learning process, to convert pure noise into an image that matches the descriptors. A similar analogy would be to give an artist a look at a marble statue, and then give them a chunk of marble and tell them to sculpt something that matches the same description. They aren't told to duplicate it, just achieve the same goals for their representation, and the 'noise' in that medium will similarly change the result.
- The tools themselves are unlikely to be a target, the tools don't actually have the images.All this being said, it's only the right owners that can sue them, and I doubt that those who have the money for that have the time, while those who can have the time probably don't have the money.
- The diffusion models also don't contain the images either, only a model that can be used to generate similar images, and since there are limitations to copyright of a style, that's probably not be actionable either. A case will eventually come up, but lawyers will wait to pick and example of this fight that they are certain they will win, because they won't want to set a precedent with a loss.
- The source data itself, like LAION, probably has the most risk, but they've set themselves up to pass that along as well, as they can point to the public URL.
- If the original URL owner had rights to post the image publicly, the discussion ends. If not, they get a takedown notice, and there's some uncertainty on how the rights get clawed back from repositories that archive data sourced from public URLs.
I won't be surprised though if the path forward on the ethics side is going to be based on generation of targeted individual likenesses (as people have rights to their own likeness). The messier side will be artistic styles, this process is as unlikely to directly replicate a copyrighted work as my cat is to leave a version of starry night in the litterbox.
Now time to go rewatch "Everything Is a Remix" for the ?th time, and maybe the explanation of the Amen break.