Gives SD something to modify - means it doesn't need to drag it out of nowhere. You get results closer to what you want. You don't have to be any good at drawing, just a triangle in the colour you want is usually enough. You can also get away with a lower denoising strength too, which reduces the risk of eldritch horrors.
To know and understand what all terms means that we might use or mention, explore this awesome glossery by no other than the eminent magnificent Jimwalrus.
After some try, and some months, i switched my train totally to kohya_ss.
I'm experimenting now and the power of lora training is really good, and faster. i reach about 2.8it/s on train, 3x fast than a1111 textual inversion.
By the way, i read all tutorial Mr-Fox suggest, and start experimenting.
So after some experiment i want to share some and discuss some training technique.
But first i show my process for training a shirt, this shirt:
5 image are only shirt with different color (for make lora more various)
2 image with subject wearing this shirt.
2) Caption
After many test, the caption text is really important and keep it similar for every image do better result, also use captioned text in promt when you are in sd give you resul you want.
in my case caption text is:
cloth1 blue shirt with a cat eating ramen on white background
and same text changing only color of shirt for other 4 image like this.
AND for image with subject:
cloth1 a man wearing black shirt with a cat eating ramen
cloth1 is a word i used to specify this shirt, seem useless but i need test more.
3) Error
My error is not cleaning original image from watermark, keep in mind for next time.
Why train a clothes? I decide to train clothes because reading around i see peoples say is really difficult,so for me start from difficult is a way to understand training better.
And what you see here is better lora i achieved, but before this i made really bad lora for test, sometimes it ignore my prompt, sometimes it do nothing. So after many experiment i found better setting that give me result i want.But doing it better is possible.
Just experimenting and grow up my training knowledge.
4) Conclusion and question
Train in standard v1.5 sd models is what many tutorial say, but it don't give me good result. I trained all my lora on my personal checkpoint, and i get better result. for what i've see pruned model and layer error model are not good for train.
Also about step and epoch: 100 step/ 1epoch seem enough an give me balanced result, 2epoch result overtrained.
50 step/2 epoch seem the same at 100 step/1epoch. (i'm in right?)
I tested this setting also on people:
Since A1111 was really the standard UI for all experimental work done with SD for "so long" (those long months, I mean, the whole tech is barely over a year old), it can still be the case that some cutting edge plugins and processes come out for A1111 first, but even that is starting to change since the public adoption of ComfyUI by the official SD team when SDXL came out.
But as someone who's rarely happy with the results one get from the original prompt generation, and always want to do a few more steps to get to what I'd call a final image, working with ComfyUI is just a dream come true. Comfy is faster, lighter, more versatile, and if your process requires taking each image and going through a dozen steps of improvements, detailing, upscaling, etc before you're finally happy with it, Comfy will allow you to set up all those steps ONCE, and then just generate good images every time, instead of manually moving through tabs for upscaling, inpainting, more upscaling, and on and on...
After some try, and some months, i switched my train totally to kohya_ss.
I'm experimenting now and the power of lora training is really good, and faster. i reach about 2.8it/s on train, 3x fast than a1111 textual inversion.
By the way, i read all tutorial Mr-Fox suggest, and start experimenting.
So after some experiment i want to share some and discuss some training technique.
But first i show my process for training a shirt, this shirt:
5 image are only shirt with different color (for make lora more various)
2 image with subject wearing this shirt.
2) Caption
After many test, the caption text is really important and keep it similar for every image do better result, also use captioned text in promt when you are in sd give you resul you want.
in my case caption text is:
cloth1 blue shirt with a cat eating ramen on white background
and same text changing only color of shirt for other 4 image like this.
AND for image with subject:
cloth1 a man wearing black shirt with a cat eating ramen
cloth1 is a word i used to specify this shirt, seem useless but i need test more.
Your captioning is wrong for the images that's wearing the tshirt.
You need to tell the AI that the person is wearing the clothing, not a tshirt. So you need to say that the person is wearing your trigger word.
It's probably much better to just use images of ppl wearing the clothing as well, since it's very unlikely that you'll use it in any other situation.
3) Error
My error is not cleaning original image from watermark, keep in mind for next time.
Why train a clothes? I decide to train clothes because reading around i see peoples say is really difficult,so for me start from difficult is a way to understand training better.
And what you see here is better lora i achieved, but before this i made really bad lora for test, sometimes it ignore my prompt, sometimes it do nothing. So after many experiment i found better setting that give me result i want.But doing it better is possible.
Just experimenting and grow up my training knowledge.
Clothing shouldn't be hard to train as it's just a matter of having good images that remains the same. The tricky part comes in if there's a TYPE of clothing you want to train that has alot of variation. In your case it would be very simple as you want it to learn just one simple tshirt.
What would be trickier is to train something like Xmas jumpers where you'd have multiple variations of patterns you'd want the AI to both learn and be able to create variations of by itself.
Train in standard v1.5 sd models is what many tutorial say, but it don't give me good result. I trained all my lora on my personal checkpoint, and i get better result. for what i've see pruned model and layer error model are not good for train.
Also about step and epoch: 100 step/ 1epoch seem enough an give me balanced result, 2epoch result overtrained.
50 step/2 epoch seem the same at 100 step/1epoch. (i'm in right?)
Either your lora or the model used for the images has a horrible issue with oversaturating. It's very obvious in the image of the girl, but it's pretty clear in the other ones as well. Given the images below as well it seems to be the model that's performing very badly.
And no, steps (as in image count x number of repeats) and epoch values aren't interchangeable in that way.
50 steps x 10 epochs is in no way the same training as 100 steps x 5 epochs
There's a lot of things that happen at the end of each epoch, this get "reset/restarted", things the "written" etc
Not sure how to best explain this....
The more you cycle through your images per epoch the more detail gets picked up, that "learned detail" gets used for further learning, more "finetuning" in the next epoch. To either correct wrongs or improve.
While it might not be 100% accurate, "loss" can be considered a representation of difference between what the AI expected and the actual value.
So lower loss suggest the AI predicted things closer too what happened.
So for most things you want the majority of your total step count to come from images x repeats and relatively low epoch count. Mostly you probably should need more than 5-10 epochs, exception generally being style.
Yes there is something wrong with your settings. I know this because I made exactly the same mistake when I started out as well
The "Inpaint Area" section needs to have 'whole picture' selected, not 'only masked'. I couldn't get my head round why it didn't work for a while but basically the prompt at the top of the page is applying to the whole picture and you have then told it that it just needs to apply to the masked section (in Mask Mode area). That's why you are getting a tiny picture-within-a-picture, because it is trying to create a picture of "purple haired girl with huge boobs sitting on the road flashing her vagina" in just the few pixels you have masked.
Actually what I find works quite well is to delete the prompt entirely and type in what you want to see in the masked area - in this case "shaved vagina" or something similar. You don't need the full prompt in inpaint, just the parts that apply to the section you are inpainting.
Oh man, I'd really love it if someone wants to make a simpsons sdxl lora. SDXL has some knowledge, but not enough. Croc has some nice high res work that can be used.
Yes there is something wrong with your settings. I know this because I made exactly the same mistake when I started out as well
The "Inpaint Area" section needs to have 'whole picture' selected, not 'only masked'. I couldn't get my head round why it didn't work for a while but basically the prompt at the top of the page is applying to the whole picture and you have then told it that it just needs to apply to the masked section (in Mask Mode area). That's why you are getting a tiny picture-within-a-picture, because it is trying to create a picture of "purple haired girl with huge boobs sitting on the road flashing her vagina" in just the few pixels you have masked.
Actually what I find works quite well is to delete the prompt entirely and type in what you want to see in the masked area - in this case "shaved vagina" or something similar. You don't need the full prompt in inpaint, just the parts that apply to the section you are inpainting.
As far as I know "whole picture" does change the whole picture, while "only masked" restricts the change to the chosen area. I tried to take all prompts away and only wrote a single prompt but it still never did what I wished..
As far as I know "whole picture" does change the whole picture, while "only masked" restricts the change to the chosen area. I tried to take all prompts away and only wrote a single prompt but it still never did what I wished..
Whole picture does change the entire picture, but as you have selected only to 'inpaint masked' then it only applies the change to the part that you have selected
You can either choose Doggettx or install xformers. i personally prefer xformers. The a1111 wiki isn't consistent with info on this so i'm not 100% sure, but if you want to install xformers you need to edit your webui-user.bat and add --xformers as shown in this screen
View attachment 2936666
,
it should install xformers automatically and then you can select xformers in cross attention optimization menu
And as i can see you have earlier version of torch. I have
View attachment 2936676
torch 2.0.1. But this topic i'll leave to someone that is more capable than me, i'm just a noob
Whole picture does change the entire picture, but as you have selected only to 'inpaint masked' then it only applies the change to the part that you have selected
Exactly. In fact, if you have sufficiently frequent previews showing, you will actually see the changes it wants to make to the whole picture, which are then thrown out right at the end in favour of the original pixels (I panicked the first time, thought it was going to fuck the whole thing up!)
Or, if you have it set to "Only Masked" it will use the prompts for the masked area - so trim the prompts hugely i.e. just have "(dense pubic hair:1.5)" or similar as your entire prompt.
I only want to refine the picture on the left (upscale + more details) but SD keeps not only adding stuff, it also does many weird pictures into one. Any idea why?
I only want to refine the picture on the left (upscale + more details) but SD keeps not only adding stuff, it also does many weird pictures into one. Any idea why?
Denoizing strength. Try putting it down to 0.1 instead of 0.65. The closer to 1 you have the denoizing strength, the more it changes the picture. So when you are trying to change something that you have masked, you want 0.65 as you want it to change, but when you want to upscale it and not change much, 0.1 is much better.
TBH I don't even use that tab if I want to upscale, I use the extras tab - much quicker for me
I only want to refine the picture on the left (upscale + more details) but SD keeps not only adding stuff, it also does many weird pictures into one. Any idea why?
Well, here are a few obvious things to change there. You can't really modify an image with img2img without a prompt, which you don't have. All the negative prompt does is tell Stable Diffusion what NOT to include on the new generated image, but it won't remove stuff from an existing source image, if that was your goal. You need to provide a very clear prompt describing exactly what you want the image to have. For upscaling I would usually just use the same prompt as you had to create the original image, but it can be adjusted if necessary.
As to why you're getting crazy new stuff on the image... it's the lack of a prompt, and also the high Denoising Strength value you have. The higher the denoising, the more you're allowing SD to ignore the source image, and pay attention only to the prompt. It's does more than that in reality, but for simplicity's sake you can think of denoising on img2img as that: zero will completely ignore the prompt and just copy the original image, 1 will ignore the source image and just follow the prompt. If you want something similar to the source you want to stick with values lower than 0.5. For general upscaling I would recommend staying below 0.3, but with the SD Upscale script it might be necessary to go even lower, like 0.15 to 0.2, to help avoid each chunk of the image generating a new version of the complete image.
The Extras tab is good if all you want is a bigger image but it won't do much more than increase resolution, which is why it is so fast; if the original image for example didn't draw the eyes of a small character in the image, the new one won't have that either. If you want more detail on the image you need to re-render it, and img2img is the way to do that.
Someone wrote here that the extra tab only upscales without adding generative details, but I do first create the pictures low detail in bulk (for speed) and then upscale those that I like. I just wanted SD to "finish" the given picture.
Ah pardon, that was my 2nd attempt. I tried it without because with the prompts I got even wilder results, so I tried to erase them as I only wanted SD to carve the given image out, not do new stuff on top of it.
I will try it later again with less denoise, but previously it worked well often with 0.65. SD did only change minor stuff.
Someone wrote here that the extra tab only upscales without adding generative details, but I do first create the pictures low detail in bulk (for speed) and then upscale those that I like. I just wanted SD to "finish" the given picture.
Ah pardon, that was my 2nd attempt. I tried it without because with the prompts I got even wilder results, so I tried to erase them as I only wanted SD to carve the given image out, not do new stuff on top of it.
I will try it later again with less denoise, but previously it worked well often with 0.65. SD did only change minor stuff.
The idea of running off lots of non-upscaled images first to get somewhere near (and refine prompts), then upscale the best is a good idea. However, the trick is to regenerate them from scratch in txt2img (i.e. same seed and prompts etc.) as well, not just running what you've previously created through Extras or img2img.
Shouldn't take long, it's only going to add a few seconds to redo that bit of work, but means it does the whole 'generative within upscaling' thing.
Re your second point, 0.65 is waaay too high for anything in img2img other than maybe converting anime to photorealistic (or vice versa).
0.5 is probably about the limit, 0.2-0.35 a better recommendation. You seem to have previously got lucky!
The idea of running off lots of non-upscaled images first to get somewhere near (and refine prompts), then upscale the best is a good idea. However, the trick is to regenerate them from scratch in txt2img (i.e. same seed and prompts etc.) as well, not just running what you've previously created through Extras or img2img.
Shouldn't take long, it's only going to add a few seconds to redo that bit of work, but means it does the whole 'generative within upscaling' thing.
Re your second point, 0.65 is waaay too high for anything in img2img other than maybe converting anime to photorealistic (or vice versa).
0.5 is probably about the limit, 0.2-0.35 a better recommendation. You seem to have previously got lucky!