Your noob to expert guide in making great AI art.

5.00 star(s) 5 Votes

wal01

New Member
Jan 24, 2019
9
7
78
Hello OP!
Are there some good FAQs you usefor features you didn't cover in tutorial? I totally miss ADetailer and several other things from WebUI and I wanna know is there smth similar in Invoke. Inpainting and upscaling is the very best thing in it but its a pain to inpaint several faces when you make a bunch of images and wanna save more than 1
 

NoTraceOfLuck

Member
Game Developer
Apr 20, 2018
479
794
163
Hello OP!
Are there some good FAQs you usefor features you didn't cover in tutorial? I totally miss ADetailer and several other things from WebUI and I wanna know is there smth similar in Invoke. Inpainting and upscaling is the very best thing in it but its a pain to inpaint several faces when you make a bunch of images and wanna save more than 1
There is no automatic ADetailer in Invoke like in WebUI. The way to get the same functionality is to 1: inpaint mask the face, and then 2: reduce the size of the bounding box (similar to what I did in my tutorial on the bounding box).

It require more steps, but you also get more flexibility on what exactly you want to upscale. If you want to upscale only the face, you can inpaint mask the face. Or if you want to upscale the entire character / entire body, you can inpaint mask the entire body. Or you can upscale things that aren't the character, such as background details.
 

osanaiko

Engaged Member
Modder
Jul 4, 2017
3,354
6,442
707
quick question, can I render 4k images? Invoke only lets me do 1536x1536.
Yes but you have to type the numbers in yourself, the sliders dont go that high
And there's a good reason why they are limited - the diffusion models are only trained on source images of a certain size. To create output with size greater than that the wrapper scripts need to make repeated separate calls and then combine the outputs. This often leads to weird artifacts at both the small scale along the seams, and also weirdness around the larger image composition.

A better way to get a 4096x4096 image is to create at 1024x1024 and then use a separate AI upscaler pass to increase the resolution while still having high detail. Not sure about Invoke but Automatic1111 and Comfy both have upscaling built in to the base functionality.
 

FemdomWifeGame

Active Member
Game Developer
Jan 24, 2021
948
2,395
386
Hello,

I wanted to thank you for this wonderful tutorial that got me started. In just a couple of days, I got all the info I needed.

As a thanks, I just wanted to repay with a bit of my knowledge. Since AI is only good if you can get consistent results, I almost immediately started toying with LoRas and creating custom characters.

So about creating your own LoRas and your own characters, here is my take.

There is a thing called Invoke training:


It's the counter-part of Invoke, it's dedicated to training your own LoRAs, to then be able to use them in Invoke.

I'm not going to walk everybody step by step through everything. This explanation is not enough, and you'll need to watch some basic tutorials about Invoke training to get the most of what I'm going to explain below.

The thing is easy to use, install Python on your computer, use the command line, and very quickly, you get your own Invoke training UI:

1750777589394.png

It requires a dataset, I think that is basically the most important part of all there is to know about creating LoRAs.

You'll need a set of images. You can generate them via Invoke. My tips is to heavily lean on ChatGPT to get a working prompt of your character, don't bother trying to deal with the tags yourself.

If you don't like the result, just iterate with ChatGPT until you have a positive and negative prompt that gets you a consistent character of your liking.

==== 1750778691735.png
Example of Invoke gallery generated via a ChatGPT prompt.

====

In your Invoke training UI, you'll need to:

- Give the path to your model (the one you actually use in Invoke).
- Give the path to your dataset config file (see below)
- Give some meta parameters for the model.

Now, when training a model, it will try to learn the key feature of your dataset, here our character, but it will need info about how to isolate that character from the noise (the noise being: the background, the clothes if the character can change outfit, the position, etc.).

To do that, you need to label your dataset accordingly. Here is a Python script I wrote that does that for you. You create folders whose names are the tags for the image in the folder, you fill those folders with image, change the path in the script, and run the script.
It will create a "dataset.jsonl" file, whose path your can past in the Invoke training UI.

====

1750778455131.png
Example of folder structure.

====

1750778563833.png
Example of folder content.

====

Python:
import os
import json

# Set the folder you want to scan
root_folder_path = "<yourpath>/invoke-training/datasets/<yourDatasetRootFolder>"  # Change this to your actual folder path
output_file = "dataset.jsonl"

file_entries = []

for root_folder_entry in os.listdir(root_folder_path):
    entry_path = os.path.join(root_folder_path, root_folder_entry)
    print(root_folder_entry);

    if os.path.isdir(entry_path):
        # List all files in the folder
        for filename in os.listdir(entry_path):
            file_path = os.path.join(root_folder_entry, entry_path, filename)
            if os.path.isfile(file_path) and filename.endswith(".png"):
                entry = {"filename": os.path.join(entry_path, filename), "text": "<yourCharacterTag>"}
                file_entries.append('{"image": "%s", "text": "oc_mia, %s", "mask": null}\n' % ((root_folder_entry + "/" + filename), root_folder_entry))


# Write to JSON
with open(root_folder_path + "/" + output_file, "w", encoding="utf-8") as f:
    for entry in file_entries:
        f.write(entry)

print(f"{len(file_entries)} files processed and written to {output_file}")
The script.
Change the path and the character tag: "<yourCharacterTag>". You'll use that tag as the keyword to identify your character in Invoke later.

You'll need about 20+ images, and you need those images to show your character in multiple setups. The reason is because if you have only 1 setup (only standing, only with a grey background, etc.) then that setup is not isolable from the character itself. We want to make sure that the only our key features are present in every images, in our case: the character face, the character colors and the character shape.

Everything else changes from image to image.
Everything else is captioned with tags (very important).

Execute the Python script.
Give the path to the created dataset.jsonl to your Invoke training UI.

About the meta-parameters:

- AdamW with a learning rate of 0.0005
People will bullshit about how the training rate is dependant of your dataset, how you need to test many values. In our case, we don't care, it's a LoRAs dataset of about 20-40 images, 0.0005 will do for you.

- The LoRAs rank dim, about 32, it's basically the size of the model. Too small and it won't abstract your character correctly, too large and it will overfit the data.
Same thing, don't bother looking for the perfect value, we've a consistent task: learn a character out of 20-40 images. 32 will do for you. You can lower it a bit if you want, it will work down to 16, but I prefer higher values.

- Number of epochs is not important, put 100, but create a checkpoint every epoch. You'll only use the latest, but by creating a checkpoint every epoch, you can stop the training whenever you feel like you're done.

- Don't bother with validation, put a high number for the number of epoch per validation, validation won't work on a LoRAs. I'll explain below why.

- Other parameters are irrelevant for us.

Once you're set, you can push the training button. The logs are in the console that you used to start the Invoke training UI (took me a while to notice it, lol).
Same thing, check the basic tutorials about it. Essentially it will create checkpoints in the "output" folder.

1750778945046.png

You can use those files as LoRAs in Invoke, as you would any other LoRAs.

Okay, now the big deal.
How do you actually use that LoRAs?

LoRAs is not going to learn your character perfectly. It's a mini-model that you trained locally. As a standalone, it's shit.

====
Example of image relying only on the LoRAs.
f3435587-55a0-474f-aac1-701c52f46ac6.png

So now you'll say: but that's not the initial character.
No it's not. LoRAs are dumb shits, and getting the right dataset requires long trial and errors. This checkpoint is also under-fitted, the longer you leave your model running, the higher chances you'll have to get a good checkpoint.

Now, for this post, I'll keep this checkpoint, since I didn't have time to finetune more. It's not a big deal.

You can adjust the character by adding some of the original tags, the major problems are usually the hair and eyes colors, since they don't take a lot of space in the image, the model loss on those is negligeable and it doesn't learn them well.

The colors are bad too, the contrast is low. That's because the model is averaging the color to minimize the overall error among all images. It's hard and long to do much better if you're a newbie, but the idea is that: it's not important.

We'll use the base model to fix it all of that for us.

====

So what's the secret to make it work? It's to layer your LoRAs into an already generated image.
The generated image with your full model will ground the quality.

So first:
  1. Generated a stand-alone environment image
  2. Using an additional raster painting, put a big spot of skin-color where you want your character to be
  3. Generate a character WITHOUT your LoRAs, but using similar tags to the one you used to generate the character, both negative prompt and positive prompt (girl, no extra limb, super res, whatever).

    You can create a regional hint twice at the same spot to create both a positive and a negative prompt. One regional hint can't have both a positive and negative prompt, I don't know why they did it that way.
    Technically you can create the character with the environment all-in-one, but I prefer to do it in two steps, especially if I want multiple characters. But that's really up to you in the end.
  4. Now you have a high quality setup with an environment and a character doing what you what, placed how you want. But it's not the correct character. Damn.
  5. So here is the trick, you put a paint mask on the character and a regional area with your character keyword. Don't redo the whole character at once! First do only the face, for example. Then only the clothes. Step by step will help you keep consistent art and quality.

    Each time you validate a step, keep the result as a raster layer.
    Important trick, put a very very low weight on your LoRas, like 0.2 and redo the generation several times. The generated part will tend toward your character, but the quality will mostly be the main model instead of your dumb shit LoRAs-level quality.
Here is an example of the full workflow:


Environment only, LoRAs is off.

1750782500083.png

Grounding for a character lying in the bed. LoRAs is still off.

There are:
1 mask to paint only on the bad.
1 positive prompt.
1 negative prompt.
1 raster layer for the body shape on the bed.

1750788918546.png

I tweaked a bit the prompt and I emptied the environment prompt, as it was creating too many tags for the model to handle the weights correctly. As a result, I just used the main prompt altogether, but regionals would have achieved the same result.

1750788966134.png

OK so now we have a positioned character, we just want to swap her with our LoRAS now. We'll reduce the mask to her face, and we'll put a regional hint on her head with our character tag.
I think the layer order is important, so put your new regional layer above the one describing the overall character setup.

I replayed a bit with the raster because the base model was too thin, because of my first stick drawing. Replaying with the new girl as the raster layer improves her over time.

1750789036452.png

Now we can enable our LoRAs (finally), with a weight of 0.2-0.7. We want the image to move toward your character, slowly but steadily, part by part.

This is a whole game of playing with the value of our LoRAs, as well as with the CFG scaling. To be honest, just try values, sometimes you'll want to boost one big time toward your characters, toward you'll want to just move a bit. It's all up to you. I did not need to play with the prompt itself, except for hair and eyes color. It's mostly about the LoRAs now.

1750789133982.png
1750789159901.png
1750789171572.png
1750789182606.png

Once we are there, we can move to the other body parts.

1750789226360.png
1750789235634.png

Legs:
1750789255073.png
1750789266060.png

You can reinforce the character by adding some of the original tags (relying on the base model more), and you can discard the bad generations, you don't have to keep them all.

Now the cleanup for the artefacts, we'll mask all the area around the character, while touching the character as little as possible, and redraw using the initial environment prompt.

1750789377594.png

1750789565954.png
Final result.

So, it's not perfect by any mean, firstly because I did those only for this post and I don't want to regenerate 100 times until it's perfect, and because I'm using an underfitted checkpoint.

Also, the character is small, but my training set didn't have "far away" examples, meaning that it doesn't know well what she is supposed to look like, but that can be improved by adding more examples in the dataset.

But the goal was to explain the workflow I developped, from OC idea to rendering the character into a concept, so here we are.

Going further, things I need to look first but I know can help:

- Learning the various noise generators. The one we usually use destroys the image completely and regenerate it almost from scratch, which is not always ideal for small adjustments. I believe some some generators may be less brutal and help preset more of the original parts into the new image.
- Learning about CFG Scale. Apparently it helps to force adherance to the prompt, or let the model do its best. This may help to force the initial character setting within an already existing environment.

edit:

I trained the checkpoint over night, so now, still applying exactly what I explained about the training of the LoRAs, here is the result of an Invoke using only the LoRAs:

1750834968215.png

So from here on, that character can safely be re-applied in actual invokes using the layered method explained above.
 
Last edited:

NoTraceOfLuck

Member
Game Developer
Apr 20, 2018
479
794
163
Hello,

I wanted to thank you for this wonderful tutorial that got me started. In just a couple of days, I got all the info I needed.

As a thanks, I just wanted to repay with a bit of my knowledge. Since AI is only good if you can get consistent results, I almost immediately started toying with LoRas and creating custom characters.

So about creating your own LoRas and your own characters, here is my take.

There is a thing called Invoke training:


It's the counter-part of Invoke, it's dedicated to training your own LoRAs, to then be able to use them in Invoke.

I'm not going to walk everybody step by step through everything. This explanation is not enough, and you'll need to watch some basic tutorials about Invoke training to get the most of what I'm going to explain below.

The thing is easy to use, install Python on your computer, use the command line, and very quickly, you get your own Invoke training UI:

View attachment 4975695

It requires a dataset, I think that is basically the most important part of all there is to know about creating LoRAs.

You'll need a set of images. You can generate them via Invoke. My tips is to heavily lean on ChatGPT to get a working prompt of your character, don't bother trying to deal with the tags yourself.

If you don't like the result, just iterate with ChatGPT until you have a positive and negative prompt that gets you a consistent character of your liking.

==== View attachment 4975778
Example of Invoke gallery generated via a ChatGPT prompt.

====

In your Invoke training UI, you'll need to:

- Give the path to your model (the one you actually use in Invoke).
- Give the path to your dataset config file (see below)
- Give some meta parameters for the model.

Now, when training a model, it will try to learn the key feature of your dataset, here our character, but it will need info about how to isolate that character from the noise (the noise being: the background, the clothes if the character can change outfit, the position, etc.).

To do that, you need to label your dataset accordingly. Here is a Python script I wrote that does that for you. You create folders whose names are the tags for the image in the folder, you fill those folders with image, change the path in the script, and run the script.
It will create a "dataset.jsonl" file, whose path your can past in the Invoke training UI.

====

View attachment 4975755
Example of folder structure.

====

View attachment 4975761
Example of folder content.

====

Python:
import os
import json

# Set the folder you want to scan
root_folder_path = "<yourpath>/invoke-training/datasets/<yourDatasetRootFolder>"  # Change this to your actual folder path
output_file = "dataset.jsonl"

file_entries = []

for root_folder_entry in os.listdir(root_folder_path):
    entry_path = os.path.join(root_folder_path, root_folder_entry)
    print(root_folder_entry);

    if os.path.isdir(entry_path):
        # List all files in the folder
        for filename in os.listdir(entry_path):
            file_path = os.path.join(root_folder_entry, entry_path, filename)
            if os.path.isfile(file_path) and filename.endswith(".png"):
                entry = {"filename": os.path.join(entry_path, filename), "text": "<yourCharacterTag>"}
                file_entries.append('{"image": "%s", "text": "oc_mia, %s", "mask": null}\n' % ((root_folder_entry + "/" + filename), root_folder_entry))


# Write to JSON
with open(root_folder_path + "/" + output_file, "w", encoding="utf-8") as f:
    for entry in file_entries:
        f.write(entry)

print(f"{len(file_entries)} files processed and written to {output_file}")
The script.
Change the path and the character tag: "<yourCharacterTag>". You'll use that tag as the keyword to identify your character in Invoke later.

You'll need about 20+ images, and you need those images to show your character in multiple setups. The reason is because if you have only 1 setup (only standing, only with a grey background, etc.) then that setup is not isolable from the character itself. We want to make sure that the only our key features are present in every images, in our case: the character face, the character colors and the character shape.

Everything else changes from image to image.
Everything else is captioned with tags (very important).

Execute the Python script.
Give the path to the created dataset.jsonl to your Invoke training UI.

About the meta-parameters:

- AdamW with a learning rate of 0.0005
People will bullshit about how the training rate is dependant of your dataset, how you need to test many values. In our case, we don't care, it's a LoRAs dataset of about 20-40 images, 0.0005 will do for you.

- The LoRAs rank dim, about 32, it's basically the size of the model. Too small and it won't abstract your character correctly, too large and it will overfit the data.
Same thing, don't bother looking for the perfect value, we've a consistent task: learn a character out of 20-40 images. 32 will do for you. You can lower it a bit if you want, it will work down to 16, but I prefer higher values.

- Number of epochs is not important, put 100, but create a checkpoint every epoch. You'll only use the latest, but by creating a checkpoint every epoch, you can stop the training whenever you feel like you're done.

- Don't bother with validation, put a high number for the number of epoch per validation, validation won't work on a LoRAs. I'll explain below why.

- Other parameters are irrelevant for us.

Once you're set, you can push the training button. The logs are in the console that you used to start the Invoke training UI (took me a while to notice it, lol).
Same thing, check the basic tutorials about it. Essentially it will create checkpoints in the "output" folder.

View attachment 4975790

You can use those files as LoRAs in Invoke, as you would any other LoRAs.

Okay, now the big deal.
How do you actually use that LoRAs?

LoRAs is not going to learn your character perfectly. It's a mini-model that you trained locally. As a standalone, it's shit.

====
Example of image relying only on the LoRAs.
View attachment 4975996

So now you'll say: but that's not the initial character.
No it's not. LoRAs are dumb shits, and getting the right dataset requires long trial and errors. This checkpoint is also under-fitted, the longer you leave your model running, the higher chances you'll have to get a good checkpoint.

Now, for this post, I'll keep this checkpoint, since I didn't have time to finetune more. It's not a big deal.

You can adjust the character by adding some of the original tags, the major problems are usually the hair and eyes colors, since they don't take a lot of space in the image, the model loss on those is negligeable and it doesn't learn them well.

The colors are bad too, the contrast is low. That's because the model is averaging the color to minimize the overall error among all images. It's hard and long to do much better if you're a newbie, but the idea is that: it's not important.

We'll use the base model to fix it all of that for us.

====

So what's the secret to make it work? It's to layer your LoRAs into an already generated image.
The generated image with your full model will ground the quality.

So first:
  1. Generated a stand-alone environment image
  2. Using an additional raster painting, put a big spot of skin-color where you want your character to be
  3. Generate a character WITHOUT your LoRAs, but using similar tags to the one you used to generate the character, both negative prompt and positive prompt (girl, no extra limb, super res, whatever).

    You can create a regional hint twice at the same spot to create both a positive and a negative prompt. One regional hint can't have both a positive and negative prompt, I don't know why they did it that way.
    Technically you can create the character with the environment all-in-one, but I prefer to do it in two steps, especially if I want multiple characters. But that's really up to you in the end.
  4. Now you have a high quality setup with an environment and a character doing what you what, placed how you want. But it's not the correct character. Damn.
  5. So here is the trick, you put a paint mask on the character and a regional area with your character keyword. Don't redo the whole character at once! First do only the face, for example. Then only the clothes. Step by step will help you keep consistent art and quality.

    Each time you validate a step, keep the result as a raster layer.
    Important trick, put a very very low weight on your LoRas, like 0.2 and redo the generation several times. The generated part will tend toward your character, but the quality will mostly be the main model instead of your dumb shit LoRAs-level quality.
Here is an example of the full workflow:


Environment only, LoRAs is off.

View attachment 4976016

Grounding for a character lying in the bed. LoRAs is still off.

There are:
1 mask to paint only on the bad.
1 positive prompt.
1 negative prompt.
1 raster layer for the body shape on the bed.

View attachment 4976352

I tweaked a bit the prompt and I emptied the environment prompt, as it was creating too many tags for the model to handle the weights correctly. As a result, I just used the main prompt altogether, but regionals would have achieved the same result.

View attachment 4976354

OK so now we have a positioned character, we just want to swap her with our LoRAS now. We'll reduce the mask to her face, and we'll put a regional hint on her head with our character tag.
I think the layer order is important, so put your new regional layer above the one describing the overall character setup.

I replayed a bit with the raster because the base model was too thin, because of my first stick drawing. Replaying with the new girl as the raster layer improves her over time.

View attachment 4976358

Now we can enable our LoRAs (finally), with a weight of 0.2-0.7. We want the image to move toward your character, slowly but steadily, part by part.

This is a whole game of playing with the value of our LoRAs, as well as with the CFG scaling. To be honest, just try values, sometimes you'll want to boost one big time toward your characters, toward you'll want to just move a bit. It's all up to you. I did not need to play with the prompt itself, except for hair and eyes color. It's mostly about the LoRAs now.

View attachment 4976361
View attachment 4976364
View attachment 4976365
View attachment 4976366

Once we are there, we can move to the other body parts.

View attachment 4976369
View attachment 4976370

Legs:
View attachment 4976371
View attachment 4976372

You can reinforce the character by adding some of the original tags (relying on the base model more), and you can discard the bad generations, you don't have to keep them all.

Now the cleanup for the artefacts, we'll mask all the area around the character, while touching the character as little as possible, and redraw using the initial environment prompt.

View attachment 4976375

View attachment 4976381
Final result.

So, it's not perfect by any mean, firstly because I did those only for this post and I don't want to regenerate 100 times until it's perfect, and because I'm using an underfitted checkpoint.

Also, the character is small, but my training set didn't have "far away" examples, meaning that it doesn't know well what she is supposed to look like, but that can be improved by adding more examples in the dataset.

But the goal was to explain the workflow I developped, from OC idea to rendering the character into a concept, so here we are.

Going further, things I need to look first but I know can help:

- Learning the various noise generators. The one we usually use destroys the image completely and regenerate it almost from scratch, which is not always ideal for small adjustments. I believe some some generators may be less brutal and help preset more of the original parts into the new image.
- Learning about CFG Scale. Apparently it helps to force adherance to the prompt, or let the model do its best. This may help to force the initial character setting within an already existing environment.
Great post!

Something you might want to try is resizing the bounding box so that it only covers the area around the character. This helps quite a bit to improve the quality of "far away" characters.

I had a similar scene. This is what my image looked like without and with changing the bounding box size. Lowering the size gives you some extra detail on the character:


Original image ------------------------------------------------------------------------- Bounding Box Adjusted
1750793921211.png 1750793960190.png

The details on how to do this are in this part of my original post: https://f95zone.to/threads/your-noob-to-expert-guide-in-making-great-ai-art.256631/post-17104572
 
  • Like
Reactions: andresblinky

FemdomWifeGame

Active Member
Game Developer
Jan 24, 2021
948
2,395
386
Great post!

Something you might want to try is resizing the bounding box so that it only covers the area around the character. This helps quite a bit to improve the quality of "far away" characters.

I had a similar scene. This is what my image looked like without and with changing the bounding box size. Lowering the size gives you some extra detail on the character:


Original image ------------------------------------------------------------------------- Bounding Box Adjusted
View attachment 4976578 View attachment 4976580

The details on how to do this are in this part of my original post: https://f95zone.to/threads/your-noob-to-expert-guide-in-making-great-ai-art.256631/post-17104572
Ah, neat trick. I didn't go that far, basically for each part of your post, I ended up trying by myself. So when I reached the LoRAs, well... It went a bit rabbit hole from that point onward haha.

I need to read the rest now :p
 

FemdomWifeGame

Active Member
Game Developer
Jan 24, 2021
948
2,395
386
OK so I've been toying around a bit.

The BBox tool is indeed very helpful for details, hands, eyes, etc.

Another tool I discovered that is amazing is the noise amount, here:

1750858282444.png

Then in your inpaint mask:

1750858294550.png

This thing allows you to say "don't reset what's inside the mask completely by replacing it with 100% noise, instead add only 37% noise".

So you can adjust an image 5% of noise at a time, or 50%, etc., depending of how much you expect the image to be modified. This is excellent for cloths, hair, iterating positions, etc.

Notably it helps a lot with the monochrome patchs of color from hand-painted raster layer. Just iterate on with 50% noise at a time, that's how I did the legs' positions below.

Here are some results:

1750864867723.png

There are problems, things I could clean. I don't really care at the moment about going too far, it's just to learn the main concepts.
 
Last edited:

NoTraceOfLuck

Member
Game Developer
Apr 20, 2018
479
794
163
OK so I've been toying around a bit.

The BBox tool is indeed very helpful for details, hands, eyes, etc.

Another tool I discovered that is amazing is the noise amount, here:

View attachment 4978899

Then in your inpaint mask:

View attachment 4978901

This thing allows you to say "don't reset what's inside the mask completely by replacing it with 100% noise, instead add only 37% noise".

So you can adjust an image 5% of noise at a time, or 50%, etc., depending of how much you expect the image to be modified. This is excellent for cloths, hair, iterating positions, etc.

Notably it helps a lot with the monochrome patchs of color from hand-painted raster layer. Just iterate on with 50% noise at a time, that's how I did the legs' positions below.

Here are some results:

View attachment 4979191

There are problems, things I could clean. I don't really care at the moment about going too far, it's just to learn the main concepts.
Those are some very impressive images! Almost doesnt look like AI!
 

FemdomWifeGame

Active Member
Game Developer
Jan 24, 2021
948
2,395
386
Those are some very impressive images! Almost doesnt look like AI!
Thanks! To be honest, it's all thanks to your tutorial. I would never have been able to figure it out on my own.

Here is the finalized version. I'll use it in my game as side-content.

mia.png

I cleaned a lot of incoherence. I may have missed some still, but honestly, I think it's good enough.
 
  • Like
Reactions: Luderos

Kameronn77x

Newbie
Jul 6, 2020
81
75
138
I'm literally brand new to all of this, but the two of you just blew my mind on both how complicated and simple this is when it's just explained with normal wording or analogies. Like, I just saw the guide and clicked out of curiousity, but I think I'll play with it this weekend now. Best of luck to both of you on your games!
 

goblingodxxx

New Member
Apr 14, 2025
12
5
3
Hey all, this is a guide I have wanted to make for a long time. I have learned so much about AI art while creating my game and figured it was time to share the knowledge.

Disclaimer: This guide is OPINIONATED! That means, this is how I make AI art. This guide is not "the best way to make AI art, period." There are many MANY AI tools out there and this guide covers only a very small number of them. My process is not perfect.

Hardware Requirements:
  • The most important spec in your PC when creating AI art is your GPU's VRAM. It really doesn't matter how old your GPU is (though newer ones will be faster), the limiting factor on what you can and cannot do with AI is almost always going to be your GPU's VRAM.
  • This guide may work with as little as 4gb of VRAM, but in general, it is recommended that you have at least 12gb, with 16gb being preferred.

No hardware? No problem:
  • If you do not have a good GPU, or just want to try some things out before buying one, the primary tool that I use in this tutorial offers a paid online service. It is the exact same tool it just runs on the website and costs money per month.
  • You can check it out here:

GPU Buying Guide:
  • Buying Nvidia will be the most headache free way to generate AI art, though it is generally possible to make things work on AMD cards with some effort. This guide will not cover any steps needed to make things work on AMD GPUs, though the tools I use all claim to support AMD as well.

On a tight budgetUsed RTX 4060 TI (16gb VRAM)This card is modern, reasonably fast, and has 16gb of VRAM
Middle of the roadRTX 5070 TI (16gb VRAM)This has the 16gb of VRAM, but will be signifiicantly faster than a 4060ti
High VRAM on a budgetUsed RTX 3090 (24gb VRAM)If you want 24gb of VRAM to unlock higher resolutions and the possibility of video generation, the RTX 3090 is the most reasonable option
Maximum powerRTX 5090 (32gb VRAM)If you have deep pockets the RTX 5090 has the most VRAM of any consumer card and is much faster than the RTX 4090

The RTX 4090 is a great card, but prices are extremely high right now. If you can find a deal, that's another good buy.



Installation and Setup:
  • The tool I will use in this tutorial is called Invoke. It has both a paid online version, and a free local version that runs on your computer. I will be using the local version, but everything in this tutorial also works in the online version.
    • Website:
  • These steps specifically are how to install the local version. If you are using the online version, you can skip all of these steps.

  1. Download the latest version of Invoke from here:
    1. View attachment 4879763
  2. Run the file you downloaded. It will ask you questions about your hardware and where to install. Continue until it is installed successfully
  3. If you have a low VRAM GPU (8gb or less) to greatly improve speed, follow these additional steps:
  4. Click Launch


Now, you will get a window like this:

View attachment 4879783


Understanding Models

Now, the most important part of AI generation: selecting a model. What is a model? I will spare you the technical details, most of which I don't understand either. Here's what you need to know about models:

  1. Your model determines how your image will look.
    1. If you get an anime model, it will generate anime images
    2. If you get a realism model, it will generate images that look like a real photograph
  2. Each model "understands" different things.
    1. One model might interpret the prompt "Looking at camera" as having the main character in the image make eye contact with the viewer
    2. A different model might interpret the prompt as having the main character literally look at a physical camera object within the scene

Your base model is the most important thing in determining how your images will look. Here are some links to some example models (note, there are thousands and thousands of models available.)

Anime Models
    • This is a popular anime model.
    • This is also an anime model, however it produces a different style of illustration from the other model.
    • This anime model produces images in more of a '3D style'

Realism Models
    • This is the most popular realism model. However, I will have a section below specifically on Flux which covers some things you will need to know before using it.
    • While realism models don't technically have different 'styles' like anime does, it is important to note that different realism models produce different styles of realism. Some models might be better at creating old people. Some might produce exclusively studio photography style images. Some might produce more amateur style images of lower quality.


Generating Your First Image

Alright, with all that new knowledge in your head, I will provide a recommended model for the remainder of this tutorial.

We will use which is a very popular anime model that is based on Illustrious.

To download this, you will require an account on Civitai. Civitai is the primary space in which users in the AI community share models. Create an account and then continue on with this tutorial.

After you've created an account, to install this model, right-click here, and click 'Copy Link'

View attachment 4879958


Now, go back to Invoke and click here:

View attachment 4879960

Then, paste the link here, and click Install:

View attachment 4879968


Most models are around 6gb, however Flux is around 30gb.

When it is done, you will see it here:

View attachment 4879973


Now go back to the canvas by clicking here:

View attachment 4879976


You will see the model has been automatically selected for you. But if you chose to install other models too, you can select the model here:
View attachment 4879980


Now, enter these prompts:

  • Positive Prompt
    • masterpiece, best quality, highres, absurdres, hatsune miku, teal bikini, outdoors, beach, sunny, sand, ocean, sitting, straight on, umbrella, towel, feet
  • Negative Prompt
    • bad quality, worst quality, worst aesthetic, lowres, monochrome, greyscale, abstract, bad anatomy, bad hands, watermark
View attachment 4879997


And click 'Invoke'

Congratulations You have made your first image:

View attachment 4880004


Now, you can create great AI art using only what you've seen so far and you're free to stop and experiment here. However, this is only the beginning of what you can do with AI.


In part 2, I will start to get into more tools and options you have available.
Is it free or paid?
 

jonUrban

New Member
Jan 19, 2020
5
6
13
Is it free or paid?
If you're talking about Invoke, the OP says you can run the community edition locally for free, just download and install.
For all the checkpoints (models), they're free. The Stable Diffusion community (r/stable diffusion) pretty much eschews non-FOSS software for local generation, online generation is a different matter for obvious reasons.
 
5.00 star(s) 5 Votes