Your noob to expert guide in making great AI art.

NoTraceOfLuck · May 27, 2025

Hey all, this is a guide I have wanted to make for a long time. I have learned so much about AI art while creating my game and figured it was time to share the knowledge.

Disclaimer: This guide is OPINIONATED! That means, this is how I make AI art. This guide is not "the best way to make AI art, period." There are many MANY AI tools out there and this guide covers only a very small number of them. My process is not perfect.

Hardware Requirements:

The most important spec in your PC when creating AI art is your GPU's VRAM. It really doesn't matter how old your GPU is (though newer ones will be faster), the limiting factor on what you can and cannot do with AI is almost always going to be your GPU's VRAM.
This guide may work with as little as 4gb of VRAM, but in general, it is recommended that you have at least 12gb, with 16gb being preferred.

No hardware? No problem:

If you do not have a good GPU, or just want to try some things out before buying one, the primary tool that I use in this tutorial offers a paid online service. It is the exact same tool it just runs on the website and costs money per month.
You can check it out here:
You must be registered to see the links

GPU Buying Guide:

Buying Nvidia will be the most headache free way to generate AI art, though it is generally possible to make things work on AMD cards with some effort. This guide will not cover any steps needed to make things work on AMD GPUs, though the tools I use all claim to support AMD as well.

On a tight budget	Used RTX 4060 TI (16gb VRAM)	This card is modern, reasonably fast, and has 16gb of VRAM
Middle of the road	RTX 5070 TI (16gb VRAM)	This has the 16gb of VRAM, but will be signifiicantly faster than a 4060ti
High VRAM on a budget	Used RTX 3090 (24gb VRAM)	If you want 24gb of VRAM to unlock higher resolutions and the possibility of video generation, the RTX 3090 is the most reasonable option
Maximum power	RTX 5090 (32gb VRAM)	If you have deep pockets the RTX 5090 has the most VRAM of any consumer card and is much faster than the RTX 4090 The RTX 4090 is a great card, but prices are extremely high right now. If you can find a deal, that's another good buy.

Installation and Setup:

The tool I will use in this tutorial is called Invoke. It has both a paid online version, and a free local version that runs on your computer. I will be using the local version, but everything in this tutorial also works in the online version.
- Website:
  You must be registered to see the links
These steps specifically are how to install the local version. If you are using the online version, you can skip all of these steps.

Download the latest version of Invoke from here:
You must be registered to see the links
Run the file you downloaded. It will ask you questions about your hardware and where to install. Continue until it is installed successfully
If you have a low VRAM GPU (8gb or less) to greatly improve speed, follow these additional steps:
You must be registered to see the links
Click Launch

Now, you will get a window like this:

Understanding Models

Now, the most important part of AI generation: selecting a model. What is a model? I will spare you the technical details, most of which I don't understand either. Here's what you need to know about models:

Your model determines how your image will look.
1. If you get an anime model, it will generate anime images
2. If you get a realism model, it will generate images that look like a real photograph
Each model "understands" different things.
1. One model might interpret the prompt "Looking at camera" as having the main character in the image make eye contact with the viewer
2. A different model might interpret the prompt as having the main character literally look at a physical camera object within the scene

Your base model is the most important thing in determining how your images will look. Here are some links to some example models (note, there are thousands and thousands of models available.)

Anime Models

You must be registered to see the links
- This is a popular anime model.
You must be registered to see the links
- This is also an anime model, however it produces a different style of illustration from the other model.
You must be registered to see the links
- This anime model produces images in more of a '3D style'

Realism Models

You must be registered to see the links
- This is the most popular realism model. However, I will have a section below specifically on Flux which covers some things you will need to know before using it.
You must be registered to see the links
- While realism models don't technically have different 'styles' like anime does, it is important to note that different realism models produce different styles of realism. Some models might be better at creating old people. Some might produce exclusively studio photography style images. Some might produce more amateur style images of lower quality.
You must be registered to see the links

Generating Your First Image

Alright, with all that new knowledge in your head, I will provide a recommended model for the remainder of this tutorial.

We will use

You must be registered to see the links

which is a very popular anime model that is based on Illustrious.

To download this, you will require an account on Civitai. Civitai is the primary space in which users in the AI community share models. Create an account and then continue on with this tutorial.

After you've created an account, to install this model, right-click here, and click 'Copy Link'

Now, go back to Invoke and click here:

Then, paste the link here, and click Install:

Most models are around 6gb, however Flux is around 30gb.

When it is done, you will see it here:

Now go back to the canvas by clicking here:

You will see the model has been automatically selected for you. But if you chose to install other models too, you can select the model here:

Now, enter these prompts:

Positive Prompt
- masterpiece, best quality, highres, absurdres, hatsune miku, teal bikini, outdoors, beach, sunny, sand, ocean, sitting, straight on, umbrella, towel, feet
Negative Prompt
- bad quality, worst quality, worst aesthetic, lowres, monochrome, greyscale, abstract, bad anatomy, bad hands, watermark

And click 'Invoke'

Congratulations You have made your first image:

Now, you can create great AI art using only what you've seen so far and you're free to stop and experiment here. However, this is only the beginning of what you can do with AI.

In part 2, I will start to get into more tools and options you have available.

NoTraceOfLuck · May 27, 2025

Before we move on, let me provide some more detail on what we just did.

Basic AI Concepts:

We entered this as a 'positive' prompt:

masterpiece, best quality, highres, absurdres, hatsune miku, teal bikini, outdoors, beach, sunny, sand, ocean, sitting, straight on, umbrella, towel, feet

We entered this as a 'negative prompt':
bad quality, worst quality, worst aesthetic, lowres, monochrome, greyscale, abstract, bad anatomy, bad hands, watermark

Positive Prompt: The positive prompt is what you want to see in the image. If you mention something in the image, if the AI has an understanding of that concept, it will try to place it in the image somewhere.

Negative Prompt: The negative prompt is what you don't want to see in the image. If you find that the model is putting things in the image that you do not want, you can place them in the negative prompt to encourage it not to place that in the image.

Resolution: Resolution is a very important factor in getting a quality image. Images should be generated in one of the following aspect ratios. The further you diverge from these aspect ratios, the worse the quality of your image will be:

Seed: The 'Seed' is a random number generated for each image. If you generate an image with all of the exact same settings and the exact same seed, you will get the exact same image every time. Keeping the seed as a single static value can let you compare the outputs of different models or different settings.

In the 'Advanced' section:

Scheduler: Generally most models will recommend a specific scheduler for the best quality. Each model can have a different "best" scheduler. You can find this in the description of the model.

However, in general:

If you model supports it, "DPM++ 2M Karras" will produce the best quality. However, many models don't support it
"Euler Ancestral" is supported by almost all models. If you're unsure, or they don't specify, you can almost always use this.

I use those 2 schedulers almost exclusively.

Steps: Steps are an important concept in AI. Again, most models will recommend a "best" amount of steps in their description. However, for most models it is generally between 28 and 35. "Steps" are effectively how long the models spends "thinking" before creating a final output. Decreasing the steps will speed up the generation, but can reduce quality if too low. Increasing the steps too high can also decrease quality.

CFG Scale: If your images look burned / deep fried, often this setting can be the culprit. Again most models will recommend a "best" CFG. However, it is generally between 3.5 and 7.5. I usually stick at 5 and rarely change it.

Popular Model Differences

I selected

You must be registered to see the links

as the model for this example. But what is this model, and why did I select it? What if you had chosen a different model?

Here is what you need to know:

There exists a set of models which I will call 'Base Models.' These models are the foundation of almost every other popular model.

Base Models: These models serve as the "core" of other models. A new base model costs a LOT of $$$$ to create and requires a large amount of community involvement to build tools for. There are currently 3 base models which have reached widespread popularity. Almost all other models are built on top of these 3 core models, and each of these 3 core models has different strengths.

Currently Popular Base Models

Base Model	Notes
SD 1.5 (Stable Diffusion 1.5)	SD 1.5 is an "old" model. The results tend to be lower quality and more chaotic. Because this model is older, it is great for low-end PCs, or for generating images very quickly on modern higher-end PCs. SD 1.5 is best used with 512x512 images SD 1.5 is so fast it can even run in PCs that don't have a GPU at all.
SDXL (Stable Diffusion Extra Large)	SDXL is currently the go-to model in most cases. It is an extremely flexible model and the hardware requirements are not too high. It can work on GPUs with 8gb of VRAM or more and can be quite fast on a modern PC. SDXL is best used with 1024x1024 images All popular anime models are based on SDXL.
Flux Dev	Flux is the newest 'base model' to gain mainstream popularity. Flux has a lot of stipulations alongside it. Flux is primarily for realism style images. It produces the highest quality realism images out of any of these 3 models. There are a few non-realism styles that Flux can do, but overall it is very limited in style and its main strength is realism. Flux has high hardware requirements and is much slower than SDXL. Flux has extremely strong "prompt adherence." I will explain "prompt adherence" in the next section. Flux generally struggles with NSFW concepts and nudity

Large Finetunes:

While 'base models' are the core of most other models, it is generally rare to use the base model itself to generate images (Flux is the one exception). Instead, it is more common to use a "Finetune"

Finetune: A finetune is a model which started as one of the 3 base models, but somebody continued training it on more images to try and teach it new concepts, or try and transform the model into a new style or otherwise improve it in some way. Some finetunes train with hundreds or thousands of new images.

While most finetunes are relatively small, there are a few which are HUGE. Some finetunes perform additional training on base models that requires months of work and millions of images. These finetunes are so extensive, that you can often treat these almost as new "base models." (Though, at their core, they are still just one of the 3 base models.)

There are 2 important concepts to know about finetunes:

Prompt Adherence: Prompt adherence is how "good" a model is at following your prompt. For example, if I type "A red square on the left, a blue circle on the right" and generate an image, if the model understands that and correctly places the shapes, it has "good prompt adherence", if it adds a yellow triangle into the image, then that is "bad prompt adherence".

Quality Tags: Some models require "Quality Tags". These are tags that enhance the quality of the output, and generally should always be included in every generation. I have included a column for each model's preferred quality tags below.
- Quality tags should always be put at the beginning of your prompt

Example Large Finetunes:

Model	What is it	How To Prompt It	Quality Tags
You must be registered to see the links	Pony was one of the first wildly successful anime finetunes of SDXL. Pony is extremely flexible with styles and learning new concepts. While the model is poor at prompt adherence compared to newer models, the ability to faithfully replicate styles makes it still a worthwhile tool in your AI toolbox. Base Model: SDXL	Pony has been trained to understand "Danbooru tags" Danbooru tags are tags from the Danbooru imageboard: You must be registered to see the links Pony does NOT understand natural language concepts. It only understands danbooru tags. Example prompt: 1girl, megumin, red dress, waving	Positive Prompt: score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up Negative Prompt: score_6, score_5, score_4 (Note you must include that full list of tags, do not try and remove parts of it, read more about it here: You must be registered to see the links )
You must be registered to see the links	Illustrious is the 'next generation' anime model and has replaced Pony in most cases. Illustrious has much better prompt adherence. Illustrious is not quite as flexible as Pony stylistically. So it is not as good as replicating exact styles. However, it still produces beautiful images. Base Model: SDXL	Illustrious is also trained on Danbooru tags. However, it also has limited understanding of non-danbooru concepts. Example prompt: 1girl, megumin, red dress, waving	Positive Prompt: masterpiece, best quality, highres, absurdres Negative Prompt: bad quality, worst quality, worst aesthetic, lowres, monochrome, greyscale, abstract, bad anatomy, bad hands, watermark,
You must be registered to see the links	NoobAI is the newest contender in the anime space. Noob is a very unique finetune. It is actually a finetune of Illustrious, so it retains many of the strengths of Illustrious. Noob, as a base model, produces very chaotic images. However, it tends to be much more creative and expressive. Noob is great for testing concepts and for inspiration. Base Model: SDXL	NoobAI is also trained on Danbooru tags. Example prompt: 1girl, megumin, red dress, waving	Positive Prompt: masterpiece, very awa, best quality, year 2024, newest, highres, absurdres Negative Prompt: bad quality, worst quality, worst aesthetic, lowres, old, monochrome, greyscale, abstract Note, NoobAI has some control over year and recency for images. I put 'year 2024' which means it will take the style of images released in 2024. If you would like to replicate a different year, you can enter that different year instead. The 'newest' tag also controls how recent the training data should be. If you want old images to influence your image, remove this tag.
You must be registered to see the links	Flux is actually not a finetune, but I am listing it here because it is very popular for realism style images. Base Model: Flux	Flux is much different from the previous models. Flux is prompted using 'Natural Language' and responds best to very long, very detailed prompts. It is generally recommend to take your prompt and send it to ChatGPT and ask it expand your prompt and make it more detailed. Flux has extremely good prompt adherence. Much better than any of the other models here. Example prompt: A young woman sits gracefully on a wooden swing suspended by ropes. She is mid-wave, her hand lifted in a friendly gesture, with a soft smile on her face. Her hair flows gently with the breeze, and the background shows a sunny, peaceful outdoor setting—perhaps a park or a garden. The mood is light, warm, and inviting.	Flux does not require quality tags. Note words like "gracefully" and "friendly" do influence the generation and are worth including.

NoTraceOfLuck · May 27, 2025

Gaining More Control

At this point, if you have been experimenting with new images, you may started to run into one of two common issues with AI art.

Issue 1: The model does not understand your concept

or

Issue 2: You can't find the words to describe exactly the image or pose that you want. (This is covered in a later section)

This is where AI starts to get more technical than just "type in a prompt." You have a number of options for gaining more control over your output.

Adding LoRAs

If you're running into an issue where your model just cannot understand your concept, it's time to look into LoRAs. A LoRA allows you to inject new knowledge into the model.

There are many different types of LoRAs:

Style LoRA
Character LoRA
Concept LoRA
Other LoRA

A style LoRA allows you to completely change the style of your model. Let's see an example of a Style LoRA here:

You must be registered to see the links

This is the "Flat Color" Style. It should cause our output to appear in a flatter, more minimalistic style. There are also style LoRAs for pretty much anything you can think of, including styles for specific animes, games, artists, etc...

But before we add this LoRA, there are some important concepts to understand.

1: LoRAs are only compatible with the model they have been trained for. To use a LoRA with Illustrious, that LoRA needs to have been created specifically for Illustrious. Now, there are some exceptions to this due to the fact that almost all models are based off of SDXL so they do share some common underlying structure and can theoretically work. But in general, for the best success, you need LoRAs that were created specifically for the model you're using.

2: Some LoRAs have 'trigger words'. Trigger words are words that you add to your prompt to enable the LoRAs. Not all LoRAs have trigger words, some will just work.

So lets verify these 2 things with this new LoRA.

At the top, you can see this LoRA has been made for many different models. Including hidream, wan, hunyuan, illustrious, and noobai. The model we are using is based on Illustrious, so be sure the 'Illustrious' version is selected.

On the right side, you can see "Base Model" says "Illustrious" which confirms we have the correct model selected.

You can also see this model does have trigger words. Since it has them, we must use them.

Installing The LoRA

We install this LoRA in the same way we install base models.

Copy the link and download it like before. If it fails, try a few times.

Once it is installed, you will see a new LoRAs section:

Optional: If you don't want to permanently remember the trigger words, you can add them in to the model itself. I will do that now. Click on the model and enter the trigger words in the "Trigger Phrases" box. Note, this model has 2 trigger words, so be sure to get both

Now, lets go back to the main page. And take a look here. Click this and select the LoRA we just installed

You will notice a new box appear:

Weight: Weight is the measure of how strongly the LoRA will affect your image. Most model pages have a recommended weight. This model does not recommend any specific weight, so I will keep the default 0.75 However if you find the effect is too strong, or not strong enough, you can increase or decrease this. In some LoRAs, you can even go negative to produce the "opposite" effect of the LoRA

Now we need to add our trigger phrases. Up on the positive prompt, click here:

It will list the trigger words of any LoRAs you have selected. Add both of them. The prompt should now look like this:

Alright, lets see what it looks like! Click "Invoke"

Wow, looks great! Now you can see how LoRAs can affect the style of an image.

If you're feeling very adventourous, you can mix and match many different style LoRAs with different weights, and produce your own truly unqiue style recipe. There are some creators who experiment and find unique styles, and guard them closely.

Concept and Character LoRAs

Concept and Character LoRAs are the same as style LoRAs, except instead of influencing the style, they influence the structure of the image.

Character LoRAs can be used to add knowledge of characters that did not exist when the model was originally trained. Illustrious was trained in 2024. Any new character from 2025 will not be recognized by the model whatsoever.

You will simply locate character LoRAs, such as this one:

You must be registered to see the links

And install them and add the trigger word to influence the model to generate your character or concept.

Lets try this one combined with the other one. You will notice this LoRA is built very well and has many separate trigger words:

In this case, i want to keep the Teal Bikini. So I will include only the trigger words for the character and for her hair/ body, I will not include the trigger words for her clothing.

Make sure you remove "Hatsune Miku" from the prompt, since we are now prompting for a different character

And then lets give it a run:

Not bad, I have combined 2 LoRAs to create this image.

Important Notes About LoRAs

While LoRAs are great and are a big part of local AI generation, there are some things you may notice eventually:

Not all LoRAs are "good". There are a lot of bad LoRAs which simply ruin your image or make it worse. If a LoRA is giving you a lot of trouble, there's a good chance that the LoRA itself is broken and can't be used.
LoRAs can affect your image in multiple ways. Eg: If you add a LoRA to create a character from Pokemon, you may notice that the entire image also gets influenced to a "pokemon" style, even if it is not a style LoRA. This is an unfortunate result of LoRAs and can sometimes be helped by reducing the weight of the LoRA, but sometimes cannot be avoided.
Adding too many LoRAs can ruin the image. "Less is more", if you add too many LoRAs, you will start to notice the quality of the image degrade and start looking "burned" or deep fried.

NoTraceOfLuck · May 27, 2025

Gaining Even More Control

Ok so we can now inject new concepts into our image, which is great, but this still doesn't give us unlimited control.

Lets look at this image:

Tip: You can right-click, recall metadata, Use All to instantly set the editor back to the state it was in when you created that image. (Prompt, LoRAs, etc...)

This image looks good, but she seems a bit shy in this image. What if I want this exact image but with her smiling and excited? (Eg: I do NOT want to generate an entirely new image with her excited, but in a different pose. I want this exact image.)

Well, we can do that with...

Inpainting

Inpainting is a very powerful concept in AI art. It allows you to manually fix up all the AI jank that can occur in an image, and can allow you to manually draw and insert concepts into an image if you just cannot get the model itself to understand what you want.

In this case, I have a goal:

I want her to have sandals on her feet
I want her to be excited and happy

Let's see how we can achieve that...

First, right click the image, and select 'New Canvas From Image' and 'As Raster Layer'

Raster Layer: A "Raster Layer" means the layer is just raw pixels. Nothing fancy. It's like drawing an image in Microsoft paint. We will look at Control Layers soon.

After clicking that button, you will see the screen change a bit. Next, you will want to click up here:

You will see some new information, some of which may look confusing at first, but it will all be explained here:

For now, lets leave everything how it is, and look over at the canvas on the left. You will see a paintbrush icon. This is exactly what it looks like, it allows you to draw on the image with new colors. And at the top, you will see a color picker.

The first thing I want to do is try and add some pink sandals to her feet. So I will select pink from the color picker:

And now I will just draw some rough sandals on her feet:

Perfect!

Now, it's important to add to the prompt any new concepts we want to add. In this case, I am adding pink sandals. So I will add pink sandals to the prompt:

FInally, click here to switch to 'canvas mode' , this will make it so the new image doesnt automatically save, but is instead presented to us and then we can decide whether to save it or not:

It should be green like this:

And click Invoke!

And here's the result:

Looks great! Right?

WRONG!

You'll notice her swimsuit changed. and other parts of the image changed slightly as well. That is NOT what my goal was. I wanted the exact same image, with pink sandals.

So now we're going to introduce a new tool... "Inpaint Masks"

Inpaint Masks: Inpaint masks are areas in the image which you can designate as the only area in the image that should change. Without any inpaint mask, it is actually re-rendering the entire image. But if we use an inpaint mask, we can limit changes to specific areas only.

Inpaint Masks

First off, we do not want to save this image, so click the trash can icon

Now, in the top right, you'll notice invoke gives us one default inpaint mask:

Click on it, and now you can use the paintbrush tool to draw your mask:

Now this is the ONLY area of the image which will change.

So lets click Invoke again.

Perfect! The rest of the image is completely unchanged, but she now has pink sandals!

Click the checkmark to accept these new changes

Now you can disable the inpaint mask by clicking here

Advanced Inpainting

Now lets try and accomplish the 2nd object, I want her to be excited and happy.

We will use the same tools as before. First, put an inpaint mask on her face and add the words "happy, excited" to the prompt

And click 'Invoke'

and boom

That looks fine. But what if I want EVEN MORE CONTROL?

I don't want her mouth to be open, I just want it to be a closed-mouth smile. We can do that by drawing it in manually:

But when I try to invoke this...

It didn't work. It gave her pretty much the same open mouth as before.

Now, I could try and mess around with the prompt and add "closed mouth" in the positive prompt or "open mouth" in the negative prompt. But it likely still won't produce the exact smile that I want.

This is where a very important concept in inpainting comes into play...

Denoising Strength: Denoising strength measure what percentage of the image is converted back to "noise" before the model starts to re-draw the image. A denoising strength of 1 means everything in the image itself is forgotten. That pink smile I drew on? Completely destroyed, converted back to "nothingness." A denoise stength of 0 means nothing changes at all. A high denoise strength gives the model more freedom to make something up. A low denoise strength limits the amount of creativity the model gets.

We can do some testing with this to better understand the concept.

Conver her whole head in an inpaint mask, and set the denoising strength to 1

Also, make sure 'seed' is set to Random for the best demonstration:

Now, notice it completely changes the whole structure of the image. Her head is no longer tilted down, and her eyes are wide and eyebrows are raised.

But if you set the denoise lower, to around .5 the changes to the image are much less

Denoise Strength is a valuable tool in your toolbox and you will adjust it often. If you want the model to completely redo something, set it very high. If you just want it to redo small details or smooth something out, it can make sense to go as low as .3, often I start a .65 and see where it gets me before adjusting further.

Now... let's get back to that other image, I want this to draw just a flat, closed mouth:

So I'm going to set the denoise to a low .2

Still not perfect...

So lets get into the last concept here

The Bounding Box

The bounding box is the last very important concept in inpainting. Now, it's a bit confusing to explain. But to keep it simple: when you generate a 1024x1024 image, it applies that 1024x1024 resolution to the entire image.

However, if you reduce the bounding box to cover only a specific area of the image, it will take the entire 1024x1024 resolution and apply it only to that small area.

In short: Reducing the bounding box size can allow you to increase the quality and resolution of specific parts of your image

Click the bounding box button here:

Now drag the corners of the bounding box to cover only her mouth + a little bit of extra space

Now, it's important that your inpaint mask does not touch the edge of the box, or you will get artifacts in your image. So carefully draw an inpaint mask covering only her mouth

Now, the entire 1024x1024 resolution will be applied specifically to just her mouth, giving it much more detail and attention. Lets see how it goes.

Now, it looks much closer to what I had drawn, and you can see the extra resolution also added some extra detail like teeth and a tongue.

NoTraceOfLuck · May 27, 2025

Maximum Control

Alright so this is great, we've learned:

How to create an image
How to inject style or character LoRAs into the image to change how it looks or teach new concepts to the model
How to draw on top of the image to manually insert details that you want

But there is one more challenge that can't be resolved with everything I've written about so far.

Let's look back again at this image:

This image looks great, but what if I decide that I like this exact pose, but I actually do not want her to be on the beach, and I don't want her in a bikini? Instead, I want her sitting outside in a grassy field, wearing a school uniform.

How might you solve this?

Well, if you're crazy, you could try inpainting all of this:

But the results arent great

And also it was a lot of work, and requires you, as the artist, to come up with all the colors and the design... surely there is a better way!

ControlNet

ControlNet is what separates the amateurs from the pros. ControlNet allows you to control the structure of an image, without adding colors or manually drawing things.

ControlNet requires a few extra models installed.

Go here:

Then click "Starter Models"

Then click the + sign next to the following models (make sure you select the 'SDXL' version)

Hard Edge Detection (canny)
Depth Map

You can also install the others ones if you want, though these are the only 2 I will cover in this tutorial.

Now back to our image

Right click the layer and select 'Copy Raster Layer To' and 'New Control Layer'

Now disable the raster layer by clicking here

Now click here

And Select "Hard Edge Detection Canny"

You'll notice something happen to our image

What is happening here:

This effectively applies an edge-detection filter to our image. Now, when we generate an image using this control, it will attempt to replicate all of these same 'edges' in the new image. Even if we completely change the prompt, the new image will have the same edges as this one.

Let's try it out.

First, click 'Apply' in the box in the bottom middle. Then, replace the prompt with this:

no lineart, flat color, masterpiece, best quality, highres, absurdres, hatsune miku, uniform, stockings, outdoors, grass, sunny, outdoors, sky, sitting, straight on, feet

And then click 'Invoke' and lets see what happens

Wow, look at that, it follows the edges of the original image almost exactly. Except now she's in a uniform and sitting in the grass.

But, this isn't really a "uniform". It still generated her with a top that resembled the old bikini top. It also generated an umbrella, and generated some strange shapes down where her hair was. Which makes sense, because if you look at the original control, all of these items were clearly marked in the image

What if I don't want the umbrella, or the bikini top, and I want it to completely redo how her hair is laying on the grass?

Well, it's simple...

With the control layer selected, you can click the 'eraser' tool

And now just... erase all the stuff that we don't want controlled. I erase the whole background, and the bikini shape, and most of her hair. Now, the model has complete freedom to generate whatever it wants in this areas:

And now, just for fun, let's draw our own little clouds in. To do this, select the paintbrush and set the color to white. Then simply draw in whatever you want.

And click Invoke

And here we go

It didnt draw my clouds in the way I expected, but overall it still looks good.

ControlNet Settings

You have a few settings on your ControlNet.

The first is weight. This is similar to LoRA weight. This is how strongly the ControlNet will be applied. At 1, it will be extremely strict following the edges, even if it is forced to make the image incoherent. It will prefer to generate a garbage, nonsense image, as long as it can guarantee that it is matching the control. At lower weights, it have more freedom to ignore the controlnet in exchange for producing a more coherent image

Begin/End %: This is the time at which the Controlnet is 'applied'. The usability here varies quite a bit, but generally if the ControlNet isn't playing how you want, you can adjust this.

Control Mode: I have tried the options here, and generally never leave Balanced mode.

ControlNet Caveats

One important thing to know about ControlNet is that it can severely degrade the quality of your image if the weight is too high. My general rule is "Keep the weight as low as possible to accomplish what I want." If I can get a good image that matches my expectations by having a weight of .2, I will generally prefer that. In general, I end up on a weight of 0.4 and find that is generally a good middleground.

NoTraceOfLuck · May 27, 2025

Regional Guidance

Now let's address another common failing in AI art: controlling where specific objects show up.

Now, you may ask, isn't that what we just did with ControlNet? Well, technically no. While there is a good chance that the model will 'figure out' that the Miku-shaped edges are where Miku is supposed to go, technically there is no guarantee of that.

Additionally, there's another challenge which I haven't shown yet: generating 2 characters in the same image.

Let's try this prompt:

no lineart, flat color, masterpiece, best quality, highres, absurdres, hatsune miku, uniform, stockings, megumin, red dress, leggings, outdoors, grass, sunny, outdoors, sky, sitting, straight on, feet

With this, I have added both Megumin and Miku to the prompt. Let's see how it does:

Okay, it didn't put Miku in the image at all. However, this may be a skill issue. Remember, this model is based on Illustrious, which is prompted using Danbooru tags. Danbooru has a specific tag "2girls" which I don't have in my prompt.

Let's add '2girls' to the prompt and see what happens.

Okay that actually turned out great (this is why Illustrious is a popular model, it can do pretty well with 2 characters)... so let's make it harder. I am adding a 3rd character to the mix:

tracer, orange jumpsuit

There, now it looks terrible.

So, how can we fix this? Let's try this mysterious button we haven't tried yet:

We actually need to press it 3 times so that it looks like this:

Now, what did we just do?

Regional Guidance: Regional Guidance is quite simple. It allows you apply a prompt to a specific area in the image. Let's add the first one. Click '+ Prompt'

Now add one of the 3 characters to each prompt. Also include their clothing

Now, click on each layer, and the select the paint brush:

And you can now 'Draw' where you want that prompt to apply. I apply each one to their own distinct area:

Note it is important that your original prompt on the left still contains all of this information:

Now lets see what happens. It took a few tries, but turned out pretty good:

There is some reduction in quality here. An image with this sort of detail would be best generated at a higher resolution.

NoTraceOfLuck · May 27, 2025

Reserved in case I do a part 7 some day

NoTraceOfLuck · May 27, 2025

Reserved in case I do a part 8 some day

AllNatural939 · May 27, 2025

I'm not personally interested in using AI, at least not for now... But this guide looks, damn, amazing.

osanaiko · May 29, 2025

Bravo NoTraceOfLuck

One of the best guides I have seen. Thank you for making and sharing it.

I've used the tools a lot, although not Invoke AI, and still learned something new - that "Pony is trained only with danbooru tags". In my usage of Pony (PonyDiffusionXLv6 to be exact) i have found that natural language phrases still work reasonably well, but i will try again with danbooru and see if that improved the "control-ability" of outputs.

NoTraceOfLuck · May 29, 2025

osanaiko said:
Bravo NoTraceOfLuck

One of the best guides I have seen. Thank you for making and sharing it.

I've used the tools a lot, although not Invoke AI, and still learned something new - that "Pony is trained only with danbooru tags". In my usage of Pony (PonyDiffusionXLv6 to be exact) i have found that natural language phrases still work reasonably well, but i will try again with danbooru and see if that improved the "control-ability" of outputs.

I do think there is some limited amount of natural language understanding (or at least, the danbooru tags are so extensive that the model seems like it is understanding natural language), however danbooru will always be the most potent with Pony.

If you haven't tried an Illustrious based model, you may find better control than most Pony models offer. Pony really shines when used with Style and Character LoRAs, it seems to be very flexible at completely learning new concepts, but I find Illustrious is the king in terms of actually understanding what you're writing.

osanaiko · May 29, 2025

Unfortunately as I am partway through making a game with a specific art style, I'm stuck with a Style LORA based on PonyXL.

But maybe a two-step process might work: initial gen of exactly what I want using illustrious, and then conversion to style via img2img with Pony+LORA.

andresblinky · Jun 1, 2025

Thanks for this OP, Love this guide, made getting started very easy, I appreciate your hard work putting this together. good stuff

NoTraceOfLuck · Jun 1, 2025

andresblinky said:
Thanks for this OP, Love this guide, made getting started very easy, I appreciate your hard work putting this together. good stuff

Glad it helped and hope it helps you make something great!

lossius · Monday at 3:49 AM

It seemed like a really cool guide, but it's a shame that my PC is worse than a potato and it kept showing as "Disconnected" in the middle of a generation and on the Steam Deck it doesn't work at all

plaseholderacc · Monday at 6:58 AM

Wow, thanks for this guide, it's great. I've been experimenting with generating images and videos using comfyui, but from reading your guide, most of these concepts apply across whatever frontend you use. I also use that exact model,

You must be registered to see the links

, with a bunch of loras - It's really quite good. I've gotten to the point where I'm generating stuff I like, but not able to get consistency across character clothing or backgrounds - inpainting and controlnets seem like the next step of what I'm looking for to get consistency - Invoke def seems a lot easier to use for these purposes, so I'll give it a try.

Midzay · Tuesday at 11:16 AM

Dude, that's awesome. Will put the link here on my beginner developer site. Let me know if you open a copy of the article in the public domain. I'll amend the URL.

tlac · Wednesday at 9:26 AM

Very good tutorial. I've already been making some AI pictures online. I discovered some of these concepts by looking at other's prompt. Now I have a much better understanding. Thank you

wal01 · Friday at 8:43 AM

Thats just awesome, thank you!

Used local webUI a lot, but it feels a bit outdated now. This guide is a good opportunity to upgrade my AI station

Can ou please include a part in Guide about saving character consistency while changing a pose or angle? For example I want the same character from_side or from_above or dancing/sitting/lying/having sex. How can I learn such power?

osanaiko · Friday at 8:59 AM

wal01 said:
Can ou please include a part in Guide about saving character consistency while changing a pose or angle? For example I want the same character from_side or from_above or dancing/sitting/lying/having sex. How can I learn such power?

I've been working on my game project for 12 months now and I don't know how to do this reliably. If you work it out please share

Your noob to expert guide in making great AI art.

Member

Member

Member

Member

Member

Member

Member

Member

I am the bad guy?

Engaged Member

Member

Engaged Member

Member

Member

Newbie

New Member

Member

Newbie

New Member

Engaged Member