Tutorial Quick tips to improve your AI game

Satori6

Game Developer
Aug 29, 2023
503
1,113
I've been toying around with AI generation for some months, and I'd like to share some of the things that have helped me.

AI can be a great aid for those of us who can't draw nor can afford throwing thousands away to get some art for a random passion project.
However, when not used correctly, the results can be rather detrimental. These tips aim to improve the art for your games.

Note that AI art generation is limited by hardware. In my case these low quality and low resolution pictures are what I can do with the PC I have. I am also limited to the use of smaller models. If you have a better PC -which is very likely-, you'll be able to get pictures of way higher quality.


1.- Start simple, add slowly

Don't start with a crazy, overly detailed prompt. Start with something basic that the AI won't have much trouble figuring out, then slowly add details/complexity.
Add details bit by bit; change a single prompt at a time, else you risk the picture changing way too drastically.

Example image progression. There were dozens of tiny changes between each of these; this is to illustrate how you can achieve considerably more complex results while maintaining the original idea by following these tips.
sample2.png 1701146717507_0.png 1701150128405_0.png final.png

2.- Use img2img

Quick, dirty edits are enough to nudge the AI on the right direction when paired with the right prompts.
Following the first rule, I started with a woman walking with some trees behind her. I wanted an eerie scenery, so simply darkening the trees and telling the AI that there's a spooky swamp behind was enough to get the desired result. I continued adding tiny edits (violet eyes, flowers, water) and feeding them back to the AI to achieve what I had originally envisioned.

1.png 2.png 3.png 5.png


3.- Personality/character matters

Describing the personality of a character will change their facial expression and/or body posture.
As usual, add one by one. Adding multiple traits that may affect facial expression can result in malformed or blurred eyes/mouths.

These three pictures simply have different descriptions for the character's attitude:

1698223127826_0.png 1698223252088_0.png 1698223307858_0.png

4.- Use those negative prompts

The AI is good at understanding what you want, but you also need to tell it what you don't want.

This can help removing commonly undesirable things (tan lines, transparent clothes...), but it can also be a life saver when trying to design petite characters, or women without cartoonishly large breasts.

As an example, when designing The Huntress, I had to add 'kid' and 'child' as negative prompts, as well as adding some silly positive prompts such as 'smallish perky adult breasts', 'beautiful adult face:1.3', and 'a petite adult:1.5'.

This is a problem I've found which isn't really an issue with the AI itself, but with the images that it trains with: most adult content features ridiculously oversized breasts and genitalia, unless it's loli, so when you ask the AI for small breasts or petite women, it falls back to what it knows and starts producing that kind of content, which isn't my goal.

36.png

5.- Collages

The AI will often produce results that contain some detail that you like, along with others that aren't as good or that are straight up errors.
If you're producing very similar looking pictures (which you should if you're aiming for a specific idea instead of just using random pictures for your game), you can very easily cut and paste the parts that you want.

Got a face that you like but you preferred the clothes from an earlier picture? Just mix them.
You can combine this with tip 2 and feed the result back to the AI to get more results with all of the parts that you liked. Same as with the quick edits, the cuts don't need to be perfect; just enough for the AI to get the idea of what you're aiming for.
This was a game changer when I started doing it. My most recent pictures are often a composite image made from 10 to 20 different ones.

1700811623540_2.png 1700811623540_2X.png

6.- Weight matters

Emphasizing specific prompts can make all of the difference in the world.
Giving tons of weigh to a prompt is likely to make it bleed into the rest of the elements, which is a problem. However, this is useful when aiming to get a particularly noticeable effect.
You can always use a combination of this and #5 to advance towards the desired result.

1701146717507_0.png 1701146876494_2.png 1701146717507_0X.png


7.- Make sure to post-edit the final picture

There will always be details that need to be fixed, and this is one of the main problems with AI art games: the developer doesn't bother with any edition whatsoever; they'll just add the first image that the AI gives them and call it a day.

Edition can take some time, but it makes a big difference.
Even if the edition is noticeable when you zoom in, it's not nearly as obvious as if left unedited.
These edits should be done by mostly using the pencil tool and painting pixel by pixel. Using airbrush or nudge tools will create very ugly results.

28.png 28.png
2.png 2.png

8.- If you think AI art is an effort-free way to get pictures, you're wrong.

AI art generators are a tool, just like an image editor or a drawing application (both of which you'll also be using). What they produce will depend on the effort that you put, and your skill at manipulating the prompts and variables.
Achieving what you want takes lots of patience, edition, trial and error.

A single picture can easily take over a thousand generations, of which I'll probably save and edit 1/10 before getting one that I'm happy with.

The goal of AI image generation is to get the result that you want, not to get a random picture with 3-legged mutants and hope that nobody notices it.
If you're aiming for effort-free results, then you might as well simply use stolen porn and call it a day.

aitests.png
 

Satori6

Game Developer
Aug 29, 2023
503
1,113
I'll give a brief example of all of the points mentioned above put into practice.
We'll review how to go from this (left) to this (right):

1.png 1707211272598_1XXXXXXXXXXXX.png

I wanted an enslaved elf woman walking down a dirt road with her hands tied behind her back, being pulled by a chain. She's only wearing leather pants. It's night time and you can see some mountains and trees at the distance.

As we've discussed, it's important to start simple. When you're starting, ask yourself what's the most important element of the picture, and focus on that.
In my case it's always the character, so what's what I start with.

Depending on the complexity of the picture, I may use some random picture found online, make a rough sketch, or a bit of both.

Keeping in mind the idea of starting simple, it's important to focus on the core elements: you don't need to start with an elf - pointy ears are easily added. She doesn't even need to be naked, or wearing the clothes that you want. The key aspect here is that her hands are tied behind her back.

I did a quick search for pictures and found some random tiktok model posing with her hands behind her back.
After a few attempt, I got this:

1.png

Once you get a base picture to work with, you can start adding the details.
Same as before - consider what's important. Out of all of the elements, not wearing a bra is probably the main one.

As mentioned, a quick edit is enough - don't waste your time on edition until you're working on the finer details. Add the ugliest breast you can draw - the AI will understand it when you pair it with the correct prompt.

breasts.png

A few generations later I got some Ai generated breasts which were then added to the picture.
I also started adding details to nudge the AI towards the elements that we'd be adding in the future.

As the chain will be the most noticeable out of them, that's the one that I made stand out. The background doesn't need any detail: you can add a dark blur and tell the AI that it's the sky, a forest, a mountain, or the ocean and it will interpret it as such.

chain.png

I also included the elven ears prompt around there. As mentioned, no edition was needed for those - I only edit details when I want them to be in a particular place, shape, or position (like that chain). There's no need to edit pointy ears - you just tell the AI to include them.

elf.png

By this point I had included a dirt road, but not the mountains, sky, nor trees.
This is mostly because when doing collages, it's easier to work with fewer elements. I wanted to make some modifications first before adding a background.

I had been using that same face for all edits, but I ended up liking a different one better.
The chain on the hands was removed, and I got rid of those details on the pants, all while adding some coloring to add a ripped pants effect.

face.png

Looking good. Let's add a mountain to the left.

mountain.png

Some clouds too. Let's give it that desolate middle earth vibe and extend those mountains.

clouds.png

I got rid of the ripped pants coloring because I decided against using them after all.
Here I found a nose and mouth pair that I liked better, so I switched them right away, because I'm rather disorganized and it's a pain having to go back through dozens of pictures to find that mouth that I wanted.

Some details may be more noticeable than others, but you can see blurry parts in most pictures where I'm adding or removing details. If you leave a couple of wrinkles on the clothes unchecked, the AI may end up adding extra clothing layers or accessories after many generations.

While you don't need to spend more than a few seconds on those edits, it's important to keep those details as you want them.

mouth.png

As some of you may have noticed, se lost half of her left arm a while ago. I completely missed this detail.
It happens, and it's one of the most annoying parts of AI generation.

While the white/silver pants fit the elven vibe, I thought they looked out of place in such a dirty environment and given her captivity status.
I tried adding some dirt to them, but didn't like the results, so in the end I decided going for brown leather instead.

pants.png

I did many tests with light brown pants, but didn't like them that much. I also toyed around with different, more fantasy-like faces, but decided against them.

That's a normal part of the process: going through dozens of images and changes only to decide that you don't like them that much and going back. That's why I like to save and edit between 1/10 and 1/20 of the pictures - so that I can always have a checkpoint to fall back if I don't like the path that I took.

In the end I went for dark brown pants, different breasts, and a bigger mountain with more trees.

changes.png

This was looking very close to the desired result.
When I get to that point, I usually start mass generating almost identical pictures in order to be able to take the parts that I like the most from each.

I'll also do some generations where I give the AI a bit more freedom to see what it comes up with. Most are trash, but sometimes it will add things that I will like and end up incorporating in the final picture. The armband and necklace, for example.

arm.png

If you look closely, there are like 15 different pictures mixed in there. This is a frankenpicture; eyes, hair, mouth, ears, breasts, pants, collar, trees, road, mountains... each element comes from a different picture, but once they get mixed, most people won't be able to tell unless you ask them to look closely.

It was here that I also noticed and added that missing arm.

After a few more tweaks, I was happy with the result and moved to the final post-editing phase.

1707211272598_1XXXXXXXXXXXX.png

This was an easy one, in the sense that I was able to get the desired results quickly due to the idea not being too complex.

It took around 300 generations. The time that takes depends on your hardware, so mileage may vary. For me, it was around 5 hours of rendering.
Most early edits are simple and quick. Later on they get more detailed and time-consuming. I probably spent 30 minutes on minor edits, and another 30 on the final one.

So about 6 hours for a pic. Sometimes I get simple ones in a couple of hours, other times I spend up to 30 working on one.

It requires some work, but not as much as it'd take to draw it from scratch when you don't know how to draw, and the electricity bill and eventual graphics card replacement is much lower than what an artist would charge you for custom art (I got quotes of $150 per picture, and that was on the cheap end).
 
Last edited:
May 26, 2020
15
48
Hi, Satori6
Thank you for this thread. Not a lot of AI threads here, sadly.

Could you please share the software and the names of the models that can create decent results?
I'm trying to play around with AI text generation, and I'd like to try creating some illustrations using the context of the output.
 
  • Like
Reactions: hax

Satori6

Game Developer
Aug 29, 2023
503
1,113
Hi, Satori6
Thank you for this thread. Not a lot of AI threads here, sadly.

Could you please share the software and the names of the models that can create decent results?
I'm trying to play around with AI text generation, and I'd like to try creating some illustrations using the context of the output.
The most commonly used software is A1111, but there are dozens of Stable Diffusion UIs. I started with that one, then the installation broke, and I moved to Makeayo, mostly because that seemed to be the easiest way to set up an installation for an AMD card. There are way more tutorials for Nvidia cards.

As for models, you can try different ones at civitai to see what you like. Originally I wanted to use Colossus, but my PC can't handle it.
 

matthewB

Newbie
Dec 9, 2018
23
22
Hey, anonintheshell (y)

If you don't own a Nvidia RTX 4090 (or any other Nvidia card with >8 GB of VRAM), I recommend to give a try, instead of

* with just 8 GB (or even 6 GB) VRAM it's much faster (30-75%, depending on your GPU)
* it's using less VRAM and doesn't crash (up until now for me) with out of memory error
(700MB to 1,4 GB less GPU memory peak, depending on your GPU)
* it has a lot of already build-in stuff, like:
Unet Patcher, Self-Attention Guidance, Kohya High Res Fix, FreeU, StyleAlign, Hypertile. Plus a lot of Samplers and Upscalers.
* switching to another model doesn't require a restart of the tool as switching the model in the UI DOES work
 

osanaiko

Engaged Member
Modder
Jul 4, 2017
2,548
4,635
Any advice on getting the same or at least fairly consistent characters?
Seriously though:

There's a few different techniques, we'll go in order of complexity:

- use well known celebrity or character names in prompts. Also try "female version of X" with male sources. Because many of the models had a lot of such images in the *cough* totally not copyright breaching data sourcing *cough*, they can often reproduce such faces quite accurately.

- for anime/cartoon/illustration style output, try to find a consistent set of prompt phrases that gives the look you want, and then include that in all your remaining prompts. Note that the adjectives used might not be directly descriptive, but somehow trigger a certain pathway(?) in the diffusion. Something like e.g. "blonde short hair, round face, sardonic, haunted". Seems to work well with some LORAs. (I use this technique)

- plugins like ReActor are designed to "redirect" the face of the output figure to look like a specific input picture. I haven't used this much myself as I don't do realistic output much.

- LORAs are extra weight sets applied alongside models at generation time that have been additionally trained to output certain subsets of images, based on a themed set of images 100-200 that the creator gathers and uses. It strongly directs the Diffusion model output to follow that style when the activation string is included in the prompt. They are generally sized between 100mb and 2gb. You can find LORAs that have already been trained and shared on Civtai. (I use loras for my work)

- Finetune models are base models that have had some additional training to focus on a more specific subset of training data. They don't give you much more effectiveness than LORAs do (as far as I understand) and take longer to train. And they are as big as the full model from which they were created. I haven't used these much.


Now, one thing to be aware of is that while you might be able to get reasonably consistent output, it is absolutely NOT going to be "perfectly consistent" if you look closely.
Hair details change. eye makeup changes. skin blemish changes. clothing details change. It is frustrating!

And that's not even considering the true Achilles heel of diffusion image generation: it's not human, there is no logic, and no understanding of physical reality or anatomy. you will get deformed hands/bodies. you will get impossible objects.

It's like this because image diffusion works by a complex mathematical process where the code takes "this big bunch of semi random pixels surrounding the pixels I am concentrating on at the moment" and "this list of words", and via a massive pre-calculated data array (the model) statistically guesses how to change the target pixels. and then repeats that over and over, across the whole picture, 10-40 times. you'll end up with *locally* reasonable pixels, but not necessarily *global* consistency.

The ways past this are:

- generate millions of images until you hit the jackpot with the several dozen outputs you need that are both reasonably close to your character and have no noticeable flaws.

- generate 10000s of images until you get ones that are close, then use inpaint or inpaint sketch to try to fix the problem areas

-generate 100s of images. Pick the parts you want from several. edit them together using a photoshop-like software. Then do img2img again to smooth the edit edges. Tada!*1

- learn to be artist, maybe with ai super power to speed up your workflow.


*1: this works for me, to some extent, for doing illustration style images. still takes fucking forever. just less forever than the other techniques.
 
Last edited: