[Long post warning]
The confusing and "laughable" nature of training guides.
Having read a ever increasing large number of "guides" and post/comments on guides regarding picking images, what optimizers to use, captioning, learning rates and other settings the only things i've really learned is hundreds of things that doesn't work (in a large number of situations) and that most of these guides/instructions are just pointless.
I can't really decide if much of this is down to general ignorance and ppl simply have no idea what's going on or if it's intentional. I've seen many cases where ppl claim that this and that works amazingly and they can use the same setup for everything with perfect results. They then link to something mean to show off this amazing work and very often shows loras/etc of some known character and it looks nothing like them. Other times there is an actual likeness so you might think that this setup actually does work so you give it ago, after all it's meant to work for everything. Big shock, it doesn't seem to work. So you download one of the loras and check the meta data, just to confirm the settings.
This generally has a few possible outcomes:
- Settings do match, fair enough as there's loads of outside variables affecting things (more on that later)
- They don't match, often a lot different and they don't even match other loras from the same person
- Meta data is missing, this is generally something intentionally done through editing or extracting the lora from something else. Both strongly suggesting they used other training methods. Dreambooth being a common explanation.
#2 can be potentially be explained by "evolving tools" or having gained new insight into this, but shouldn't that also been updated in your guide?
If you compare guides you also come across conflicting claims as well. IE you have two guides that have the same basic raters, optimizers, number of image recommendations etc, yet one of them say it should take ~1000 steps to be perfect and the other says 4-5000. Assuming one of them is correct, then one will be either very undertrained or overtrained...both can't really be right
A lot of guides are posted places where ppl can in some way give feedback etc, often with improvements/suggestions themselves, or "corrections". Great since this means there's more data to work with, what does get a bit suspicious though is when the authors respond to "praise" and chances to advertise things they can "profit" (ie youtube videos etc), but they completely ignore issues raised. Adding this things together you almost get the impression that there's ppl intentionally posting things that is misleading or lacking to make ppl fail and/or repeatedly having to review the instructions while their own work gets "propped up" and they profit in terms of downloads and views...hmmmm, nah ppl can't be that petty and self centered right...
Since i figured out how i could actually do training on my very limited setup i've been trying to find some fairly basic and consistent way to get ppl at least a fair bit on the way. BUT as i mentioned early on, guides seem pointless, so i guess instead i'd rant about it and ppl can just skip over it....
HOWEVER, what might be more useful is to know things that can be screwing things up so ppl don't needlessly waste months trying to figure things out and spend hours upon hours burning out their GPU...
- There seems to be an issue with kohya_ss and training SD1.5. Exactly when it started seems to be a bit uncertain since the last working version for some is in April, other in June. Personally the last version i've gotten to work and that trains fairly well is from June. There seems to be some disagreement about what is causing it too. Some claim it's related to newer versions of bitsandbytes, but i've updated that on my working version and no real difference. The issue also affects other optimizers so it can be the only cause. But it's worth considering if you got issues training. Latest version also seems to have broken SDXL, but that might be fixed fairly fast since XL seems to be the main priority.
- If you do follow a guide, keep in mind that not only does your images and captions make a difference, but also the versions of the tools you are using. That includes that different libs they are dependent on.
- Relating to #1 and #2, if you have something that works for you, be VERY careful about updating. Yes you can just go back to a previous commit, but then you'll have to keep a close eye on requirements too.
- Regarding updating/downgrading, it might seem as simple as just running the setups/requirements install again and it's good to go, but it seems that's not always the case. Just as recently as last night i decided to update a single dependency for kohya_ss. Start it up, things get downloaded, says it's updated...and no change...hmmm, i manually do the install, pip says it's already installed, check the version in the correct folder and it seems to match. Start again and nope, still no change...Forced reinstall and still no change. Delete the lib in question, install and finally it's actually working.
So despite things seemingly being updated for you, it might not really be working as it should be, so it might be worth while clearing out some/all of those python libs once in a while. Folder in question is generally ./venv/ etc, it might cause you some downloading and waiting while it all reinstalls so in some cases it might be enough to just delete the lib in question.
Small tip for anyone that made it to the end, just because you're training one concept, it doesn't mean you need to keep all images in the same folder and give everything the same priority. Splitting them into different folders with different repeats might be useful at times
Random image added to catch some ppls eyes, don't really have a stop sign or light to make ppl have to stop and wait so she'll have to do.
And yes she's looking at you...
View attachment 3015973
(Edited because formatting broke )