- Jan 24, 2020
- 1,401
- 3,802
Don't forget or overlook the awesome Lora training guide on rentry I often link to. It was a huge help for me and it gets updated on regular basis as new knowledge, tools and other development progresses. So when something new comes out he usually updates his guide with a section about this new thing and he follows up with his conclusion after doing tests etc.Bros, I'll be grateful for any corrections/additional tips to put into this post:
---
Troubleshooting LORA Training
So, took me a few times to successfully train LORA, part I am a moron, part - older hardware.
First, do rule out the issues with the dataset and use Schlongborn's dataset included in his LORA training post. This dataset works, and given it has only 20 images, you are guaranteed to waste minimal time while troubleshooting. Also, his post includes LORA that you can check against as reference using values 0.7 for model and 1.0 for clip. Here is a ComfyUI workflow that you can just plug and play:
Now, if you train a LORA on that dataset, this is what can go wrong:You don't have permission to view the spoiler content. Log in or register now.
Getting black render - you used " Network Rank (Dimension)" with value of 1. I am a moron because Schlongborn's post says use 128, but I overlooked it. For some reason "1" is the default for Kohya's September 2023 install and with all those dials I just missed it. Make sure to use at least 128 for this parameter on your initial tries. Same for "Network Alpha", make it 128. I don't know if 128/1 or somesuch will work, I just know that 128/128 works. Why the default is 1/1 is beyond me. Interestingly, this does affect the size of the LORA. The 1/1 gives you around ~10mb, while 128/128 gives you a ~150mb LORA.
Getting unresponsive LORA - i.e. you get images rendered, but you can't tell if it worked because nothing looks like what you'd expect. That's because the training didn't work out. Here's what's up, when LORA trains, the prompt will tell you there is a loss, like this:
View attachment 2909179
And if you are getting "loss=NaN" then the LORA gets zeroes for weights. What likely causes this is the "Mixed precision" setting. It should be "no", because your hardware probably doesn't support fp16 or bf16 options for whatever reason. It actually might support it, but given Kohya uses a bunch of third party modules, one of these modules might just incorrectly identify what you have. So, set "Mixed precision=no" and restart the training: if you start having loss equal to some number, you probably fixed the issue. Strangely, "Save precision-fp16" is fine.
Verify LORA. Kohya has a tool - you can check either your own LORA, or whatever LORA you downloaded. Bad LORA's output section will look different and will have zeroes all over the place:
View attachment 2909185
I have not seen anything remotely close to this guide anywhere. Most just post their "guide" and abandon it next minute.
You must be registered to see the links