Hello,
I wanted to thank you for this wonderful tutorial that got me started. In just a couple of days, I got all the info I needed.
As a thanks, I just wanted to repay with a bit of my knowledge. Since AI is only good if you can get consistent results, I almost immediately started toying with LoRas and creating custom characters.
So about creating your own LoRas and your own characters, here is my take.
There is a thing called Invoke training:
You must be registered to see the links
It's the counter-part of Invoke, it's dedicated to training your own LoRAs, to then be able to use them in Invoke.
I'm not going to walk everybody step by step through everything. This explanation is not enough, and you'll need to watch some basic tutorials about Invoke training to get the most of what I'm going to explain below.
The thing is easy to use, install Python on your computer, use the command line, and very quickly, you get your own Invoke training UI:
View attachment 4975695
It requires a dataset, I think that is basically the most important part of all there is to know about creating LoRAs.
You'll need a set of images. You can generate them via Invoke. My tips is to heavily lean on ChatGPT to get a working prompt of your character, don't bother trying to deal with the tags yourself.
If you don't like the result, just iterate with ChatGPT until you have a positive and negative prompt that gets you a consistent character of your liking.
====
View attachment 4975778
Example of Invoke gallery generated via a ChatGPT prompt.
====
In your Invoke training UI, you'll need to:
- Give the path to your model (the one you actually use in Invoke).
- Give the path to your dataset config file (see below)
- Give some meta parameters for the model.
Now, when training a model, it will try to learn the key feature of your dataset, here our character, but it will need info about how to isolate that character from the noise (the noise being: the background, the clothes if the character can change outfit, the position, etc.).
To do that, you need to label your dataset accordingly. Here is a Python script I wrote that does that for you. You create folders whose names are the tags for the image in the folder, you fill those folders with image, change the path in the script, and run the script.
It will create a "dataset.jsonl" file, whose path your can past in the Invoke training UI.
====
View attachment 4975755
Example of folder structure.
====
View attachment 4975761
Example of folder content.
====
Python:
import os
import json
# Set the folder you want to scan
root_folder_path = "<yourpath>/invoke-training/datasets/<yourDatasetRootFolder>" # Change this to your actual folder path
output_file = "dataset.jsonl"
file_entries = []
for root_folder_entry in os.listdir(root_folder_path):
entry_path = os.path.join(root_folder_path, root_folder_entry)
print(root_folder_entry);
if os.path.isdir(entry_path):
# List all files in the folder
for filename in os.listdir(entry_path):
file_path = os.path.join(root_folder_entry, entry_path, filename)
if os.path.isfile(file_path) and filename.endswith(".png"):
entry = {"filename": os.path.join(entry_path, filename), "text": "<yourCharacterTag>"}
file_entries.append('{"image": "%s", "text": "oc_mia, %s", "mask": null}\n' % ((root_folder_entry + "/" + filename), root_folder_entry))
# Write to JSON
with open(root_folder_path + "/" + output_file, "w", encoding="utf-8") as f:
for entry in file_entries:
f.write(entry)
print(f"{len(file_entries)} files processed and written to {output_file}")
The script.
Change the path and the character tag: "<yourCharacterTag>". You'll use that tag as the keyword to identify your character in Invoke later.
You'll need about 20+ images, and you need those images to show your character in multiple setups. The reason is because if you have only 1 setup (only standing, only with a grey background, etc.) then that setup is not isolable from the character itself. We want to make sure that the only our key features are present in every images, in our case: the character face, the character colors and the character shape.
Everything else changes from image to image.
Everything else is captioned with tags (very important).
Execute the Python script.
Give the path to the created dataset.jsonl to your Invoke training UI.
About the meta-parameters:
- AdamW with a learning rate of 0.0005
People will bullshit about how the training rate is dependant of your dataset, how you need to test many values. In our case, we don't care, it's a LoRAs dataset of about 20-40 images, 0.0005 will do for you.
- The LoRAs rank dim, about 32, it's basically the size of the model. Too small and it won't abstract your character correctly, too large and it will overfit the data.
Same thing, don't bother looking for the perfect value, we've a consistent task: learn a character out of 20-40 images. 32 will do for you. You can lower it a bit if you want, it will work down to 16, but I prefer higher values.
- Number of epochs is not important, put 100, but create a checkpoint every epoch. You'll only use the latest, but by creating a checkpoint every epoch, you can stop the training whenever you feel like you're done.
- Don't bother with validation, put a high number for the number of epoch per validation, validation won't work on a LoRAs. I'll explain below why.
- Other parameters are irrelevant for us.
Once you're set, you can push the training button. The logs are in the console that you used to start the Invoke training UI (took me a while to notice it, lol).
Same thing, check the basic tutorials about it. Essentially it will create checkpoints in the "output" folder.
View attachment 4975790
You can use those files as LoRAs in Invoke, as you would any other LoRAs.
Okay, now the big deal.
How do you actually use that LoRAs?
LoRAs is not going to learn your character perfectly. It's a mini-model that you trained locally. As a standalone, it's shit.
====
Example of image relying only on the LoRAs.
View attachment 4975996
So now you'll say: but that's not the initial character.
No it's not. LoRAs are dumb shits, and getting the right dataset requires long trial and errors. This checkpoint is also under-fitted, the longer you leave your model running, the higher chances you'll have to get a good checkpoint.
Now, for this post, I'll keep this checkpoint, since I didn't have time to finetune more. It's not a big deal.
You can adjust the character by adding some of the original tags, the major problems are usually the hair and eyes colors, since they don't take a lot of space in the image, the model loss on those is negligeable and it doesn't learn them well.
The colors are bad too, the contrast is low. That's because the model is averaging the color to minimize the overall error among all images. It's hard and long to do much better if you're a newbie, but the idea is that: it's not important.
We'll use the base model to fix it all of that for us.
====
So what's the secret to make it work? It's to layer your LoRAs into an already generated image.
The generated image with your full model will ground the quality.
So first:
- Generated a stand-alone environment image
- Using an additional raster painting, put a big spot of skin-color where you want your character to be
- Generate a character WITHOUT your LoRAs, but using similar tags to the one you used to generate the character, both negative prompt and positive prompt (girl, no extra limb, super res, whatever).
You can create a regional hint twice at the same spot to create both a positive and a negative prompt. One regional hint can't have both a positive and negative prompt, I don't know why they did it that way.
Technically you can create the character with the environment all-in-one, but I prefer to do it in two steps, especially if I want multiple characters. But that's really up to you in the end.
- Now you have a high quality setup with an environment and a character doing what you what, placed how you want. But it's not the correct character. Damn.
- So here is the trick, you put a paint mask on the character and a regional area with your character keyword. Don't redo the whole character at once! First do only the face, for example. Then only the clothes. Step by step will help you keep consistent art and quality.
Each time you validate a step, keep the result as a raster layer.
Important trick, put a very very low weight on your LoRas, like 0.2 and redo the generation several times. The generated part will tend toward your character, but the quality will mostly be the main model instead of your dumb shit LoRAs-level quality.
Here is an example of the full workflow:
Environment only, LoRAs is off.
View attachment 4976016
Grounding for a character lying in the bed. LoRAs is still off.
There are:
1 mask to paint only on the bad.
1 positive prompt.
1 negative prompt.
1 raster layer for the body shape on the bed.
View attachment 4976352
I tweaked a bit the prompt and
I emptied the environment prompt, as it was creating too many tags for the model to handle the weights correctly. As a result, I just used the main prompt altogether, but regionals would have achieved the same result.
View attachment 4976354
OK so now we have a positioned character, we just want to swap her with our LoRAS now. We'll reduce the mask to her face, and we'll put a regional hint on her head with our character tag.
I think the layer order is important, so put your new regional layer above the one describing the overall character setup.
I replayed a bit with the raster because the base model was too thin, because of my first stick drawing. Replaying with the new girl as the raster layer improves her over time.
View attachment 4976358
Now we can enable our LoRAs (finally), with a weight of 0.2-0.7. We want the image to move toward your character, slowly but steadily, part by part.
This is a whole game of playing with the value of our LoRAs, as well as with the CFG scaling. To be honest, just try values, sometimes you'll want to boost one big time toward your characters, toward you'll want to just move a bit. It's all up to you. I did not need to play with the prompt itself, except for hair and eyes color. It's mostly about the LoRAs now.
View attachment 4976361
View attachment 4976364
View attachment 4976365
View attachment 4976366
Once we are there, we can move to the other body parts.
View attachment 4976369
View attachment 4976370
Legs:
View attachment 4976371
View attachment 4976372
You can reinforce the character by adding some of the original tags (relying on the base model more), and you can discard the bad generations, you don't have to keep them all.
Now the cleanup for the artefacts, we'll mask all the area around the character, while touching the character as little as possible, and redraw using the initial environment prompt.
View attachment 4976375
View attachment 4976381
Final result.
So, it's not perfect by any mean, firstly because I did those only for this post and I don't want to regenerate 100 times until it's perfect, and because I'm using an underfitted checkpoint.
Also, the character is small, but my training set didn't have "far away" examples, meaning that it doesn't know well what she is supposed to look like, but that can be improved by adding more examples in the dataset.
But the goal was to explain the workflow I developped, from OC idea to rendering the character into a concept, so here we are.
Going further, things I need to look first but I know can help:
- Learning the various noise generators. The one we usually use destroys the image completely and regenerate it almost from scratch, which is not always ideal for small adjustments. I believe some some generators may be less brutal and help preset more of the original parts into the new image.
- Learning about CFG Scale. Apparently it helps to force adherance to the prompt, or let the model do its best. This may help to force the initial character setting within an already existing environment.