Your noob to expert guide in making great AI art.

5.00 star(s) 5 Votes

NoTraceOfLuck

Active Member
Game Developer
Apr 20, 2018
536
901
163
Wow, this is incredible, thank you for putting this together! I have two questions:
  1. I just ordered an RTX 3060 12GB. Should I still follow the low VRAM steps? I know your guide says 8 GB or lower, but then throughout the rest of this thread I see that 16 GB seems to be often referred to as the lowest you should really go and expect good results.
  2. What's the best way to maintain consistency for characters? I've seen that other tutorial as a reply here where it seemed to imply training a LoRA and keeping the same prompt language. Is that what's recommended?
Yes 12GB will be fine. I made this whole tutorial on a laptop with only 8GB! 16GB is pretty much the point where you can stop worrying about VRAM at all, which is why it is usually used as a good recommendation, but especially for just plain image generation, you won't have any problems with 12GB.

The best ways to maintain consistency really are these steps, in this order:
  1. Use a character LoRA
    1. A character LoRA is the best way to maintain consistency. However if you're using a custom character, you will likely need to make this LoRA yourself. If you're using a "known" character, there is probably already one that exists for you to download.
  2. Use the same prompt to describe the character and same model for all generation
    1. As long as you don't change your prompt or model, you'll usually have decent results in consistency, but it will rarely be perfect
 
  • Like
Reactions: test4815

osanaiko

Engaged Member
Modder
Jul 4, 2017
3,424
6,593
707
3060/12GB is a fine card to use SDXL models which are 6GB-ish (it's touch-and-go to fit in 8GB because windows takes 1.5GB for itself on startup).

Given how underwhelming the performance bump on the 40x0 and 50x0 series has been, the 3060 remains one of the best price-performance GPU Card options out there for local AI generation work.

Beyond SDXL, the diffusion models from other families (qwen or flux) have fundamentally different Model architectures, and can and will have different model file sizes - for example the Flux.1 series are baseline 22GB ( if i recall correctly?), but those also have various quantization levels available from the usual suspects so you can find one that fits nicely in your VRAM space.

However be aware that the 12GB VRAM will not allow you to do much with regards to higher-res/longer runtime video model stuff (using WAN2.1 model etc), although there are some dramatically cut-down versions that run (slowly) in as little as 6GB.

Regarding consistency:

This is the second hardest thing to get right with AI generative tools. I use a combination of the following:

1. a library of character sprites in various poses, angles, and expressions, and then cut-n-pasting into generated images and doing img2img to heal the joins.

1a. You can generate sprite sheets with various angles and expressions from a single base image using one of the "Context" models like Flux.1-Kontext. Recently The "big-boys" like gemini's banana can also do a great job of this, just need to make sure the clothes are on!

2. lots of hand curation of the output - the images you generate are basically "Free" so don't try to tweak your prompt perfectly to get the output you want, just yolo it and then generate 100 images and pick the best representation of your character.

2a. you can also cut-n-paste various parts of those 100s of generated images together to get the details right. e.g. the face is correct, but the detail of the character clothing is wrong? then paste-in a clothing piece that is correct and smooth it over with img2img.

3. I do 2d-cartoon style stuff so it's possible for even a non-artist like me to draw over an image to fix stuff (and of course then use img2img to clean up and get consistent style)

4. find a good LORA of the style and character you want - starting from something "close" to the desired result makes things easier. Actually this might be the most important tip.

5. Leverage a game design that does not need 1000s of full screen perfectly consistent images (a la BADIK etc). Sprite games like Scammertime Saga etc etc let you re-use the "good" sprites over and over in the conversation parts. (This is the approach I am taking)

EDIT: I found this other post I used where I raved about using Flux kontext for making new poses etc from a base image: https://f95zone.to/threads/how-to-generate-consistent-ai-images.258091/#post-17544494
 
Last edited:
  • Like
Reactions: test4815

CrysusPariah2

Member
May 25, 2025
318
480
82
This has been so helpful. Thank you so much.

Does anybody have any advice, if you don't want to start from scratch and create your own characters, but instead create fan art for an existing project?
 

1nsomniac22

Newbie
Game Developer
Jul 16, 2025
62
76
27
This has been so helpful. Thank you so much.

Does anybody have any advice, if you don't want to start from scratch and create your own characters, but instead create fan art for an existing project?
Grab 6 to 10 images of a single subject, preferably of full body, dressed and nude. Generate a first "cheap" Lora from those. Obtain or bulk produce a series of additional images - focusing on face & different expressions and generate a gen 2 Lora using the accumulated data set. That's the "easy" way to train up a resource to create (as good as current gen AI tools can) consistent character images. At least, that's what I've done for my work.
 

osanaiko

Engaged Member
Modder
Jul 4, 2017
3,424
6,593
707
It probably goes without saying, but to clarify the excellent advice from 1nsomniac22:

using your own judgement for the images you select for the LORA generation is key - to get the 100-ish images needed for the 2nd gen LORA you might need to dig through 1000s of generated images with various prompts, discarding 90-95% of them.
The core value you need to judge on is "the generated image looks almost exactly how I want"
- visually close to the character's unique features (how close is up to you)
- clothing or body proportions match your character
- absolutely no "bad ai image" characteristics (extra fingers, impossible anatomy, etc etc. You know it when you see it)
- sticks to the prompt

One step of making LORA is to tag the images. the first pass of this is always automated. but you can also then add your own extra tags, to call out specific feature / poses / clothing / expressions that you want to reproduce in your eventual outputs. So this is another place where your human ability to add real information can improve the final product.
 

1nsomniac22

Newbie
Game Developer
Jul 16, 2025
62
76
27
It probably goes without saying...
Everything osanaiko said x1000! I didn't want to scare/bore the thread with all the fiddly-bits, but seeing that we're in the very excellent Noob to Expert guide thread:

If you're training a LoRA locally you need a vidcard with at least 8GB RAM. All my first (and) second gen Loras were trained on a RTX 4700 mobile (on the laptop I'm writing the post from). It... was slow - two to three days to get through a small dataset of less than 50 images with 6 internal generations of 1500-1800 steps (this is all LoRA training wonk-speak).

Here's a link to a useful LoRA Training guide I used when I was first figuring things out:


And since I'm feeling shameless today, here's a link to a devlog I wrote about my own experience in training my first LoRA with very few initial assets:


Since that first experiment I've 100% gone with the 2 gen approach, where I use a initial cheap LoRA to build a full training set for the actual production LoRA - but I also try to reserve LoRAs for my main characters only. This is also because I'm working with 100% synthetic characters for my stories that need very specific appearances and traits (and expressions... which are their own challenge). I will say, getting the body relatively stable is easy. Getting a wide enough expression set - that's difficult and where osanaiko 's advice is critical for tagging and filtering the source data. But... this is digital media... disk space is cheap and electricity is... well used to be cheap.

For training the LoRA, look into kohya. It's confusing and annoying, but runs well enough to get started. I don't know if there are more recent projects that make the training easier, but this one runs well enough and doesn't crash out after multiple days of cooking:


For tagging you data set - I start with the tagging util in the kohya UI and then use the Booru dataset manager to clean and refine the tags. It's good 'nuff for moderate sized data sets and beats manually editing tag files for hours:


Anyway, that's some of the fiddly-bits you'd need to get started. Hope this helps/is useful for someone else!
 

Existence X

Member
Oct 14, 2018
156
332
258
How I managed to make InvokeAI work with AMD Drivers on Windows

Hi everyone. Sometime ago I found this tutorial and got excited to try it out. Unfortunately, I stumbled into this roadblock:

1765741692895.png

I looked into alternatives briefly (WSL, ZLUDA), but couldn't understand how to make it work. Honestly, I didn't even know what I was doing half the time. However, if you are in the same boat, I have good news: I managed to get it working now.

My Setup:
  • RAM: 16GB
  • GPU: AMD Radeon RX 6600 (8GB VRAM)
  • OS: Windows 10

Disclaimer: This is not a formal tutorial, as I don't have the technical knowledge to explain what every step do and why it works. Instead, I will describe every step I took to get my first successfully generated image so you can try your luck. Proceed at your own risk.

Note: I am borrowing most of the steps from the "mind_slicer" guide on the Invoke Discord (bugs-and-support > AMD support on windows). If you're reading this, thank you, mind_slicer!

The Problem

InvokeAI uses CUDA (NVIDIA) or ROCm (AMD) to generate images. While CUDA is standard on Windows, ROCm has historically been Linux-only or difficult to configure on Windows.

The Solution

There are a few alternatives (Windows Subsystem for Linux, ZLUDA, etc.), but we will use "The Rock" (AMD's native ROCm implementation for Windows) combined with specific community builds for RDNA2 cards.

Step-by-Step Guide

Part A: Basic Setup (Based on mind_slicer's guide)


  1. Check PowerShell Policy: Open PowerShell and type Get-ExecutionPolicy. If it returns Restricted, run Set-ExecutionPolicy RemoteSigned and agree to the changes. This allows the Invoke scripts to run.
  2. Download the Launcher: Get the latest version from the official link:
  3. Install Invoke: Run the installer. Important: Do not install version 6.9.0; use 6.8.1. When asked "What GPU do you have?", choose AMD.
  4. Open Dev Console: Open the Invoke launcher and click "Launch Developer Console".
  5. Clean Up Torch: Run the following command to remove the default libraries: uv pip uninstall torch torchaudio torchvision Minimize the launcher, but do not close it.

Part B: Installing ROCm & Torch

We are utilizing AMD's "The Rock" platform. You generally need to check the to see if your architecture is supported (Green checkmarks mean you are good to go).

  • If you have a newer card (e.g., RX 9070 XT / RDNA4): You can likely follow the official instructions on TheRock's GitHub to find the specific pip command for your architecture (e.g., gfx1201). It will look something like this (adapt for your specific card!): uv pip install --pre torch torchaudio torchvision --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/. After installing torch, you can jump to step 10.
  • If you have an older card (like my RX 6600): My card (RDNA2 architecture) wasn't officially supported yet. However, I found a branch merging RDNA2 support, so it could be available in the near future. If you are in this scenario, follow these steps instead:
If using older card only:
6. Update Python: Ensure you are running Python 3.12.10.
7. Download Custom Builds: Go to and download all seven files.
8. Install ROCm: In the Invoke Dev Console, navigate (cd) to the folder where you downloaded the files and run:

uv pip install "rocm-7.1.1.tar.gz" "rocm_sdk_libraries_gfx103x_all-7.1.1-py3-none-win_amd64.whl" "rocm_sdk_devel-7.1.1-py3-none-win_amd64.whl" "rocm_sdk_core-7.1.1-py3-none-win_amd64.whl"

9. Install Torch: Once ROCm is done, install the Torch wheels:

uv pip install "torch-2.9.1+rocmsdk20251207-cp312-cp312-win_amd64.whl" "torchaudio-2.9.0+rocmsdk20251207-cp312-cp312-win_amd64.whl" "torchvision-0.24.0+rocmsdk20251207-cp312-cp312-win_amd64.whl"

10. Finalize: Close the console and launch InvokeAI. The first launch (cold start) will take a long time. Ignore errors regarding ROCm or bitsandbytes in the logs. If you see "Using torch device: AMD Radeon RX 6600" — you win.

Optimizing Performance (Optional, I did not use it)
"The Rock" is under active development. You might encounter slow generation times or "hanging" during VAE Decode. To fix the "forever decode" issue and MIOpen conflicts:

  1. Make sure InvokeAI is closed.
  2. Go to C:\Users\<YOUR_USERNAME>\. If you see a .miopen folder, delete it.
  3. Create a custom launch script to force a specific MIOpen mode. Open Notepad, paste the following, and save it as invoke_launch_amd.bat (remote the ' from the at):

'@'echo off
REM Replace "D:\invoke" below with the actual path to your Invoke folder
start "InvokeAI" cmd /k "cd /d D:\invoke && call .venv\Scripts\activate.bat && set MIOPEN_FIND_MODE=2 && invokeai-web"
exit


Use this .bat file to launch Invoke from now on.

Troubleshooting Common Issues
Besides the usual errors, here is how I fixed specific crashes:

1. Can't download/import a model

  • Cause: Seems to be an issue with Invoke 6.9.0.
  • Solution: Downgrade to 6.8.1 and repeat the installation steps.

2. Invoke crashes (closes) when generating an image

  • Cause: Low storage on the C: drive. Even if Invoke is on D:, Windows uses the Pagefile (virtual memory) on the C: drive by default.
  • Solution: I freed up 50GB on my C: drive. Alternatively, you can move your Pagefile to another drive.
  • How to change Pagefile: Settings → System → About → Advanced system settings → Performance (Settings…) → Advanced → Virtual memory (Change…). Set C: to "No paging file" and your other drive to "System managed size".

3. The image renders as plain black or with artifacts
1765742660512.png
  • Solution: I did two things (not sure which fixed it, but try both):
  • Freed up disk space (as mentioned above).
  • Edited invokeai.yaml to better manage VRAM

enable_partial_loading: true
device_working_mem_gb: 4
keep_ram_copy_of_weights: false


The Result:

1765740891322.png


That's it! I hope that my experience could help you set this up.

Good luck and hit Invoke discord channel if you need any help!
 

NoTraceOfLuck

Active Member
Game Developer
Apr 20, 2018
536
901
163
How I managed to make InvokeAI work with AMD Drivers on Windows

Hi everyone. Sometime ago I found this tutorial and got excited to try it out. Unfortunately, I stumbled into this roadblock:

View attachment 5528705

I looked into alternatives briefly (WSL, ZLUDA), but couldn't understand how to make it work. Honestly, I didn't even know what I was doing half the time. However, if you are in the same boat, I have good news: I managed to get it working now.

My Setup:
  • RAM: 16GB
  • GPU: AMD Radeon RX 6600 (8GB VRAM)
  • OS: Windows 10

Disclaimer: This is not a formal tutorial, as I don't have the technical knowledge to explain what every step do and why it works. Instead, I will describe every step I took to get my first successfully generated image so you can try your luck. Proceed at your own risk.

Note: I am borrowing most of the steps from the "mind_slicer" guide on the Invoke Discord (bugs-and-support > AMD support on windows). If you're reading this, thank you, mind_slicer!

The Problem

InvokeAI uses CUDA (NVIDIA) or ROCm (AMD) to generate images. While CUDA is standard on Windows, ROCm has historically been Linux-only or difficult to configure on Windows.

The Solution

There are a few alternatives (Windows Subsystem for Linux, ZLUDA, etc.), but we will use "The Rock" (AMD's native ROCm implementation for Windows) combined with specific community builds for RDNA2 cards.

Step-by-Step Guide

Part A: Basic Setup (Based on mind_slicer's guide)


  1. Check PowerShell Policy: Open PowerShell and type Get-ExecutionPolicy. If it returns Restricted, run Set-ExecutionPolicy RemoteSigned and agree to the changes. This allows the Invoke scripts to run.
  2. Download the Launcher: Get the latest version from the official link:
  3. Install Invoke: Run the installer. Important: Do not install version 6.9.0; use 6.8.1. When asked "What GPU do you have?", choose AMD.
  4. Open Dev Console: Open the Invoke launcher and click "Launch Developer Console".
  5. Clean Up Torch: Run the following command to remove the default libraries: uv pip uninstall torch torchaudio torchvision Minimize the launcher, but do not close it.

Part B: Installing ROCm & Torch

We are utilizing AMD's "The Rock" platform. You generally need to check the to see if your architecture is supported (Green checkmarks mean you are good to go).

  • If you have a newer card (e.g., RX 9070 XT / RDNA4): You can likely follow the official instructions on TheRock's GitHub to find the specific pip command for your architecture (e.g., gfx1201). It will look something like this (adapt for your specific card!): uv pip install --pre torch torchaudio torchvision --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/. After installing torch, you can jump to step 10.
  • If you have an older card (like my RX 6600): My card (RDNA2 architecture) wasn't officially supported yet. However, I found a branch merging RDNA2 support, so it could be available in the near future. If you are in this scenario, follow these steps instead:
If using older card only:
6. Update Python: Ensure you are running Python 3.12.10.
7. Download Custom Builds: Go to and download all seven files.
8. Install ROCm: In the Invoke Dev Console, navigate (cd) to the folder where you downloaded the files and run:

uv pip install "rocm-7.1.1.tar.gz" "rocm_sdk_libraries_gfx103x_all-7.1.1-py3-none-win_amd64.whl" "rocm_sdk_devel-7.1.1-py3-none-win_amd64.whl" "rocm_sdk_core-7.1.1-py3-none-win_amd64.whl"

9. Install Torch: Once ROCm is done, install the Torch wheels:

uv pip install "torch-2.9.1+rocmsdk20251207-cp312-cp312-win_amd64.whl" "torchaudio-2.9.0+rocmsdk20251207-cp312-cp312-win_amd64.whl" "torchvision-0.24.0+rocmsdk20251207-cp312-cp312-win_amd64.whl"

10. Finalize: Close the console and launch InvokeAI. The first launch (cold start) will take a long time. Ignore errors regarding ROCm or bitsandbytes in the logs. If you see "Using torch device: AMD Radeon RX 6600" — you win.

Optimizing Performance (Optional, I did not use it)
"The Rock" is under active development. You might encounter slow generation times or "hanging" during VAE Decode. To fix the "forever decode" issue and MIOpen conflicts:

  1. Make sure InvokeAI is closed.
  2. Go to C:\Users\<YOUR_USERNAME>\. If you see a .miopen folder, delete it.
  3. Create a custom launch script to force a specific MIOpen mode. Open Notepad, paste the following, and save it as invoke_launch_amd.bat (remote the ' from the at):

'@'echo off
REM Replace "D:\invoke" below with the actual path to your Invoke folder
start "InvokeAI" cmd /k "cd /d D:\invoke && call .venv\Scripts\activate.bat && set MIOPEN_FIND_MODE=2 && invokeai-web"
exit


Use this .bat file to launch Invoke from now on.

Troubleshooting Common Issues
Besides the usual errors, here is how I fixed specific crashes:

1. Can't download/import a model

  • Cause: Seems to be an issue with Invoke 6.9.0.
  • Solution: Downgrade to 6.8.1 and repeat the installation steps.

2. Invoke crashes (closes) when generating an image

  • Cause: Low storage on the C: drive. Even if Invoke is on D:, Windows uses the Pagefile (virtual memory) on the C: drive by default.
  • Solution: I freed up 50GB on my C: drive. Alternatively, you can move your Pagefile to another drive.
  • How to change Pagefile: Settings → System → About → Advanced system settings → Performance (Settings…) → Advanced → Virtual memory (Change…). Set C: to "No paging file" and your other drive to "System managed size".

3. The image renders as plain black or with artifacts
View attachment 5528739
  • Solution: I did two things (not sure which fixed it, but try both):
  • Freed up disk space (as mentioned above).
  • Edited invokeai.yaml to better manage VRAM

enable_partial_loading: true
device_working_mem_gb: 4
keep_ram_copy_of_weights: false


The Result:

View attachment 5528690


That's it! I hope that my experience could help you set this up.

Good luck and hit Invoke discord channel if you need any help!
Wow this is great, I will add a direct link to your comment from my main post so that people can find it easily!
 

1nsomniac22

Newbie
Game Developer
Jul 16, 2025
62
76
27
Just a quick question Existence X - ever considered running a Linux container on windows? I used to run Automatic1111 in a said Linux container on my primary laptop and still did all my work on the Windows side - it's a bit of a PITA because you bifurcate the filesystem (unlike on Mac where the Linux subsystem exists under the GUI and both terminal and applications have access to everything).
 

Existence X

Member
Oct 14, 2018
156
332
258
Just a quick question Existence X - ever considered running a Linux container on windows? I used to run Automatic1111 in a said Linux container on my primary laptop and still did all my work on the Windows side - it's a bit of a PITA because you bifurcate the filesystem (unlike on Mac where the Linux subsystem exists under the GUI and both terminal and applications have access to everything).
Yes, but I think I have mistaken it for the WLS approach. I think I did read that even with WLS it wouldn't work for me and gave up the idea.

I would say that I was just lazy, and I'm a bit of a layman; I've never worked with creating containers and so on.

Even then, I don't know if RDNA 2 GPUs would work on Linux since it's not listed here:
 
5.00 star(s) 5 Votes