[Stable Diffusion] Prompt Sharing and Learning Thread

Sepheyer

Well-Known Member
Dec 21, 2020
1,531
3,618
Yes I agree, Daz is more consistent because you have direct control while SD is always a dice toss however SD is light years ahead in visuals and realism. Though using controllnet and openpose SD is catching up to DAZ in repeatability and consistency. Also with SD you are not forced to endure endless menus only to tweak one little thing..
Can't wait for the day where "generative" part adds a few more loops to understand that you want SD to build a 3D chara off one single 2D image, then dress/undress her, then build LORA/whatnot around her and turn her into a proper callable object that can be plugged consistently into scenes created via a similar approach.
 

me3

Member
Dec 31, 2016
316
708
a1111 have done a new update (v1.3.0).
in this update we have Cross attention optimization.
I've made a test with all of this .
Setting's and time in post.

Are you using it? what are your favorite?
you might want to hold off on updating, or the very least give it some thought depending on your setup and usage.
stuff i've found so far (that might sound kinda small at first but isn't).
images are drop into default temp folder, (which has been an issue before too going by a discussion on their git), there's a setting to change the temp dir but that doesn't work and things still just get dumped into default temp.
if you save all images by default, a copy single images get put in their usual place (what ever you got that set as), grids however doesn't, so far they only show in tmp for me. (if you're using SSDs you might want to consider the extra pointless writes)

if you open images from the UI you get the ones stored in temp, highlighting the issue that file= access isn't limited to just subfolders, meaning you can technically gain access to more important things so probably worth keeping that in mind if you got things running with any kind of remote/public access. (i doubt this actual issue is anything new)
 

devilkkw

Member
Mar 17, 2021
308
1,053
every update break something, actually i see image are stored on default tmp folder, also another problem is unable to save image as jpg, settings are ignored. maybe wait next good update. i update only because want to try these new cross attention.
 
  • Like
Reactions: Mr-Fox

modine2021

Member
May 20, 2021
381
1,253
well here we go again. another error preventing me from doing anything. google told me nothing. nothing happens after clicking generate. google gave small info but the lines are no where to be found in the .py they said to edit
RuntimeError: expected scalar type Float but found Half
 

me3

Member
Dec 31, 2016
316
708
I was hoping it might help with some of the problems i've been having (that backfired...)
Been trying to train a person for probably over 1k hours now and i can't make sense of why it's behaving as it is.

To start at the "easy" end, in the beginning when testing the training stages i got either someone with clearly an asian origin or of an african one, both in features and skin tones...eventually the few cases of asian dropped out completely.
Problem is, the person is without any doubt, is white, even the fact that they have blue eyes should remove the option being much else, so can't see why it's happening.
Another problem is that the first 1-2 stages in the training picks up the bodyshape pretty perfectly, then beyond that stuff just get flattened down.

I've tried simple captions, tagged everything and just tagged specific things, it changes stuff but nothing seem to affect the ethnicity and body issues. The captions are read and even without that it should "work".

Having trained using other image sets pretty successfully, meaning you could easy tell it's the same people, i can't see why this is going so horribly wrong...

Suggestions are welcome
 

me3

Member
Dec 31, 2016
316
708
well here we go again. another error preventing me from doing anything. google told me nothing. nothing happens after clicking generate. google gave small info but the lines are no where to be found in the .py they said to edit
RuntimeError: expected scalar type Float but found Half
not sure how you are running things, but usually that happens when you're missing the launch options --no-half or --no-half-vae
 

modine2021

Member
May 20, 2021
381
1,253
not sure how you are running things, but usually that happens when you're missing the launch options --no-half or --no-half-vae
using these. is it right? was trying to speed things up a bit

--xformers --opt-channelslast --disable-safe-unpickle --precision full --disable-nan-check --skip-torch-cuda-test --medvram --always-batch-cond-uncond --opt-split-attention-v1 --opt-sub-quad-attention --deepdanbooru --no-half-vae
 

me3

Member
Dec 31, 2016
316
708
using these. is it right? was trying to speed things up a bit

--xformers --opt-channelslast --disable-safe-unpickle --precision full --disable-nan-check --skip-torch-cuda-test --medvram --always-batch-cond-uncond --opt-split-attention-v1 --opt-sub-quad-attention --deepdanbooru --no-half-vae
Code:
--xformers 
--opt-split-attention-v1
--opt-sub-quad-attention
i think those 3 can't work together as the code is set up with conditions making only one of them actually be applied.
I'd probably go with xformers.
Since you say you're looking for "speed", if you don't need the --medvram you should remove it as that slow things down quite a bit and i think the --precision full increases vram usage so that kills a bit of the point for --medvram
 

modine2021

Member
May 20, 2021
381
1,253
Code:
--xformers
--opt-split-attention-v1
--opt-sub-quad-attention
i think those 3 can't work together as the code is set up with conditions making only one of them actually be applied.
I'd probably go with xformers.
Since you say you're looking for "speed", if you don't need the --medvram you should remove it as that slow things down quite a bit and i think the --precision full increases vram usage so that kills a bit of the point for --medvram
adding --no-half fixed it .. thanks for other suggestions
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,794
I was hoping it might help with some of the problems i've been having (that backfired...)
Been trying to train a person for probably over 1k hours now and i can't make sense of why it's behaving as it is.

To start at the "easy" end, in the beginning when testing the training stages i got either someone with clearly an asian origin or of an african one, both in features and skin tones...eventually the few cases of asian dropped out completely.
Problem is, the person is without any doubt, is white, even the fact that they have blue eyes should remove the option being much else, so can't see why it's happening.
Another problem is that the first 1-2 stages in the training picks up the bodyshape pretty perfectly, then beyond that stuff just get flattened down.

I've tried simple captions, tagged everything and just tagged specific things, it changes stuff but nothing seem to affect the ethnicity and body issues. The captions are read and even without that it should "work".

Having trained using other image sets pretty successfully, meaning you could easy tell it's the same people, i can't see why this is going so horribly wrong...

Suggestions are welcome
Are you trying to train a Lora using kohya ss? If so, the checkpoint you are training on is very important.
Here's the best info source I have found on Lora training:

For my own Lora I tried a few different ones and it landed on Elegance. One could of course try SD1.5 Base Model but I read that it's not the best for pose variations.

If you however are only talking about generating images using img2img, then put Asian and black or African in negative and white or Caucasian in positive.

It's much easier to help people if you would write more thoroughly what your issue is and the context.. ;)
 
Last edited:

me3

Member
Dec 31, 2016
316
708
  • kohya_ss, trying to train a TI.
  • Tried different learning rates and scheduer
  • Tried with and without regularisation images
  • Using sd 1.5 model, i did discover that the sd 1.5 model that kohya downloads on its own has an issue, i can't remember exactly what it was, but it was throwing a small loading error which was hard to spot in all the output text.
  • Sample images generated during training purely using the name as positive prompt outputs (obviously poorly and disfigured) likenesses of the training data, so it's clearly learning something. However when using the TI files from each epoch in a1111, on the same model, there's either a small likeness that gets washed out later on or completely wrong "thing" (like the ethnicity bit) that keeps constant through out.
    Which makes it seem like partial or none of the learned data truly gets written to the TI files, or it somehow written "wrong"
  • Given the point above, been running the training at 4 vectors. It should been enough as i've trained several TIs in a1111, one of which i posted an image of already, (first image). Running a training now at 20 vectors just to see, but considering i've trained at 4 just fine, even at 2, vector count should be enough.
And before you ask, reason why i'm trying a TI and not LORA is that they got one "ability" that LORA doesn't. LORAs apply themselves to all subjects, but TIs you can have a group photo of multiple trained people/objects.

=== Updated ===
Using a much higher vector count seem to have improved upon things, i ran the training at a much higher LR just to see so overfitting is a very possible cause of the remaining issues. Running test now at 1/50 of the LR (i said it was much higher :p).
One unexpected side effect of the higher vector count is that the sample images had a higher quality, Assuming that's due to how they are generated with each vector being given, but images has much higher detail and "quality".

Potential lesson to learn: Don't listen to all the ultimate/superawesomest/allyouneedtoknow guides and experts telling you that you never need (in this case) vectors above a low amount...
 
Last edited:

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,794
  • kohya_ss, trying to train a TI.
Ok. I have not done any TI yet so I can't be of any help. I can share links to info sources though.
(Textual Inversion/Hypernetwork Guide)
("--RETARD'S GUIDE TO TEXTUAL INVERSION--")
(Training a Style Embedding in Stable Diffusion with Textual Inversion)

In my experience TI's are almost always problematic so I keep myself to Lora's for now. The same goes for Hypernetworks.
 

sharlotte

Member
Jan 10, 2019
268
1,440
THe best one I've seen (and followed) for training TI is:
It was posted here a few weeks back but probably not on front page. Here the author is creating a character and then training a TI on it and it gets very good results. It's not an overly long process and it gets good results.
 

KingBel

Member
Nov 12, 2017
407
3,182
THe best one I've seen (and followed) for training TI is:
It was posted here a few weeks back but probably not on front page. Here the author is creating a character and then training a TI on it and it gets very good results. It's not an overly long process and it gets good results.
Hi

This is the github link:

Should probably also check out the textual inversion channel on the Unstable Diffusion Discord for lots more resources and tutorials/discussions.
 

me3

Member
Dec 31, 2016
316
708
Ok. I have not done any TI yet so I can't be of any help. I can share links to info sources though.
(Textual Inversion/Hypernetwork Guide)
("--RETARD'S GUIDE TO TEXTUAL INVERSION--")
(Training a Style Embedding in Stable Diffusion with Textual Inversion)

In my experience TI's are almost always problematic so I keep myself to Lora's for now. The same goes for Hypernetworks.
That's sort of the type of "guides" i'm referring to, generally lacking in details or just flat out wrong in many regards, often important ones
THe best one I've seen (and followed) for training TI is:
It was posted here a few weeks back but probably not on front page. Here the author is creating a character and then training a TI on it and it gets very good results. It's not an overly long process and it gets good results.
Hi

This is the github link:

Should probably also check out the textual inversion channel on the Unstable Diffusion Discord for lots more resources and tutorials/discussions.
Used that guide in the beginning for some things but there has to be something horribly wrong with the training explanation, 25 images and just 150 steps simply doesn't work. Someone has pointed that out to the creator as well, but they seem completely unwilling to respond to the issue.
Seemingly the creators own explanations in different places doesn't match up either, making it look like they are mixing up terms and/or settings.

Also, one thing that seem to be very relevant with any training is the actual data used, images, captions and all the settings, however you don't really see people supplying those. If they did it would mean others could replicate the results (assuming the guides were accurate, which i'm starting to doubt in many cases) and then use that as a basis for their own images as they then knew better what to look for during the process.
 
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
This is from a training in a1111.
help.png
I was testing doing a "warm up" like dreambooth etc uses, so every epoch the learning rate increased marginally to about 10% of the steps. Third image says just 25 steps but that's ~3 epochs and tbh i'm struggling to find much difference with the one at 2200 steps. I tried the same setup for a different set of images and it failed completely despite having same amount of images, same "distribution", same simple captioning etc. Unfortunately i don't have any of the results from it but it started with something that would put body builders to shame and when i gave up it was at something that would be between a very successful anorexic and a skeleton...

I can't work out why one worked and one didn't, nor does it really make any logical sense (clearly it does to a computer though so i guess there is something logical), which is why guides should provide the data involved as it (can) make a huge difference in results and at least you know what you got to work with and the target, which make it much easier to find the path.
 
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,794
That's sort of the type of "guides" i'm referring to, generally lacking in details or just flat out wrong in many regards, often important ones
Yes this is a prevalent problem in many areas on the internet, people making guides about various things without actually being knowledgeable enough to do so or does it quickly and sloppy.

Also, one thing that seem to be very relevant with any training is the actual data used, images, captions and all the settings, however you don't really see people supplying those. If they did it would mean others could replicate the results (assuming the guides were accurate, which i'm starting to doubt in many cases) and then use that as a basis for their own images as they then knew better what to look for during the process.
Yes this is exactly it. In training Lora's it's the same, source images quality and the caption and setting is key to a good end result.
I used an excellent guide for Lora training with op doing regular updates as he learns and the tools gets updated etc.
He also shares everything about process and data and settings etc. I don't know if there's any crossover for TI training though
(In case your interested).
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,794
This is from a training in a1111.
View attachment 2659356
I was testing doing a "warm up" like dreambooth etc uses, so every epoch the learning rate increased marginally to about 10% of the steps. Third image says just 25 steps but that's ~3 epochs and tbh i'm struggling to find much difference with the one at 2200 steps. I tried the same setup for a different set of images and it failed completely despite having same amount of images, same "distribution", same simple captioning etc. Unfortunately i don't have any of the results from it but it started with something that would put body builders to shame and when i gave up it was at something that would be between a very successful anorexic and a skeleton...

I can't work out why one worked and one didn't, nor does it really make any logical sense (clearly it does to a computer though so i guess there is something logical), which is why guides should provide the data involved as it (can) make a huge difference in results and at least you know what you got to work with and the target, which make it much easier to find the path.
Beautiful woman. :love:
 

devilkkw

Member
Mar 17, 2021
308
1,053
This is from a training in a1111.
View attachment 2659356
I was testing doing a "warm up" like dreambooth etc uses, so every epoch the learning rate increased marginally to about 10% of the steps. Third image says just 25 steps but that's ~3 epochs and tbh i'm struggling to find much difference with the one at 2200 steps. I tried the same setup for a different set of images and it failed completely despite having same amount of images, same "distribution", same simple captioning etc. Unfortunately i don't have any of the results from it but it started with something that would put body builders to shame and when i gave up it was at something that would be between a very successful anorexic and a skeleton...

I can't work out why one worked and one didn't, nor does it really make any logical sense (clearly it does to a computer though so i guess there is something logical), which is why guides should provide the data involved as it (can) make a huge difference in results and at least you know what you got to work with and the target, which make it much easier to find the path.
posted something about training Textual inversion time ago, i use standard a1111 train.
But what i modify when train people, is the image, i use 768x768, and i cut every background to keep only person i want in image, saving it as png with alpha. Then when trainig i check "usa alpha a loss weight".
Also description come more simple because you are able to describe better the subject, and waste all about background.
Ratio for 3000 step i usally use is: 1.9:200, 0.9:400, 0.4:600, 0.06:800, 0.0005, i save a image and TI every 100 step, and check what is better during training.
Usally from 800 to 1700 step start getting better result, so i check these TI in generation phase and try what really is better.
One trained with this value is Oily Helper, found it in my civitai profile.