AI LORAs for Wildeer's Lara Croft (Development Thread)

me3

Member
Dec 31, 2016
316
708
...
That is why I allways add FD before any of lora trigger words that I use and also test them just in case.
Such as FDGlory instead of just Glory
That may not work as well as you hope considering it's exactly the same as i were doing and mentioned in the previous post. While this is the only case i've had it happen so far it does mean it's possible to happen in any similarly named situation.
I'm rerunning the training with a tested clean trigger, everything else is exactly the same, so we'll see if there's a difference.
HOWEVER it's a 10200 step run and with my setup that's gonna take a while, i've gotten it down to a bit over 6 hours but on 6gb 1060 that's fairly fast. If i'd cropped everything to max 512 instead of 768 it'd probably be possible to get it even faster but that testing will have to wait for a different dataset
 
  • Like
Reactions: Mr-Fox and felldude

felldude

Active Member
Aug 26, 2017
516
1,506
That may not work as well as you hope considering it's exactly the same as i were doing and mentioned in the previous post. While this is the only case i've had it happen so far it does mean it's possible to happen in any similarly named situation.
I'm rerunning the training with a tested clean trigger, everything else is exactly the same, so we'll see if there's a difference.
HOWEVER it's a 10200 step run and with my setup that's gonna take a while, i've gotten it down to a bit over 6 hours but on 6gb 1060 that's fairly fast. If i'd cropped everything to max 512 instead of 768 it'd probably be possible to get it even faster but that testing will have to wait for a different dataset
Have you tried U-net only training as most models probably have 100's or 1000's of lara croft related images already?
 

felldude

Active Member
Aug 26, 2017
516
1,506
So I did a training off of data set 3 cropped.

I didn't curate the images I just ran a small batch:

1girl

ComfyUI_00052_.png ComfyUI_00051_.png ComfyUI_00050_.png

1girl with negatives

ComfyUI_00059_.png ComfyUI_00058_.png ComfyUI_00060_.png


My honest option is the source images are a little blurry but you could likely train the AI with AI images by refining a Lora with a Lora or creating a new data set using img to img to add detail to these.

Or that is how I would approach it.

Out of the 100's of lora's I have made and 1000's of hours of training on my machine, I still have a ton to learn and it every new peice of tech like BF16 vs FP16 adds to the challenge.
 
  • Like
Reactions: Mr-Fox

Sepheyer

Well-Known Member
Dec 21, 2020
1,528
3,598
...but if you run a batch of images with the WildeerLaraCroft they will almost all have strange artifacts.
Try running with WildolphinLaraCroft or WilbirdLaraCroft. My money's on that's what's bringing out the residiuals.
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
So I did a training off of data set 3 cropped.

I didn't curate the images I just ran a small batch:

1girl

View attachment 2927243 View attachment 2927244 View attachment 2927245

1girl with negatives

View attachment 2927259 View attachment 2927260 View attachment 2927261


My honest option is the source images are a little blurry but you could likely train the AI with AI images by refining a Lora with a Lora or creating a new data set using img to img to add detail to these.

Or that is how I would approach it.

Out of the 100's of lora's I have made and 1000's of hours of training on my machine, I still have a ton to learn and it every new peice of tech like BF16 vs FP16 adds to the challenge.
It's very easy to improve the images in photoshop. Use camera raw filter to adjust light and color and then smart sharpen to improve fidelity and sharpness.
 
  • Like
Reactions: felldude

felldude

Active Member
Aug 26, 2017
516
1,506
It's very easy to improve the images in photoshop. Use camera raw filter to adjust light and color and then smart sharpen to improve fidelity and sharpness.
I usually AI enhance, the downside is you can end up with similar facial details across loras like a freckle in the same spot.

Here are a few 1024x1024 enhanced with the lora I just trained.

ComfyUI_00083_.png ComfyUI_00075_.png ComfyUI_00074_.png

And a few using the same steps with the Wilder 1.0.2 Lora

ComfyUI_00093_.png ComfyUI_00084_.png ComfyUI_00094_.png

Then an Image to image test for facial and body details both at karras .550

ComfyUI_00103_.png ComfyUI_00102_.png

And finally both images generated with the same prompts and Lora values at .5

ComfyUI_00117_.png

ComfyUI_00116_.png
 
Last edited:
  • Like
Reactions: Mr-Fox

felldude

Active Member
Aug 26, 2017
516
1,506
It appears the lora is over trained by a fairly large margin since I am going in depth here.


So this is prompts: style (anime:1.2), female on the beach in a tiny bikini

Lora WilderLaraV1.02

ComfyUI_00187_.png

Same prompts and both lora's where 1.0 (You could argue I under trained mine a bit)

ComfyUI_00182_.png

I should mention I ran a batch of 100 images and not a single one was in the anime style for the first lora.
This could be do to the text encoder of the lora having a lot of photoreal tags and is not always a bad thing, but I do think in this instance it is do to over fitting.

In this case I think it is the result of clip and not U-Net as this is the image with Lora 1 and clip 0

ComfyUI_00225_.png

Just a comparison of U-Net I would say the WilderLaraV1.02 does a better job of maintain facial detail at 1.0 without clip

.5 U-Net Only

ComfyUI_00251_.png ComfyUI_00254_.png


1.0 U-Net Only

ComfyUI_00247_.png ComfyUI_00242_.png

Most of us can not train in Dreambooth settings
As such we are using shortcuts such as XFORMERS and BF16

Using a low learning rate and high iterations and epochs with tools that are designed to take shortcuts usually leads to mixed results at best.

I criticize my own work even more, and I do not consider this a quality I would normaly put out. (Now my old loras lol some of those where trained in 256x256) But for testing purposes here is the beta.
 
Last edited:
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
I usually AI enhance, the downside is you can end up with similar facial details across loras like a freckle in the same spot.

Here are a few 1024x1024 enhanced with the lora I just trained.

View attachment 2927464 View attachment 2927465 View attachment 2927466

And a few using the same steps with the Wilder 1.0.2 Lora

View attachment 2927470 View attachment 2927471 View attachment 2927469

Then an Image to image test for facial and body details both at karras .550

View attachment 2927476 View attachment 2927477

And finally both images generated with the same prompts and Lora values at .5

View attachment 2927489

View attachment 2927490
I was talking about preparing the source images for the Lora training. The problem with using any Ai for this is that you don't have any direct control over the result, that's the entire point with training a Lora in the first place, to gain more control over the result.. The same goes for enhancing images after generating them with SD, if you use something in SD to enhance the image afterwards, often they come out looking overcooked especially the skin. I see a lot of that going on in the SD showcase thread, not gonna mention any names..:sneaky::p This Ai generated image thing is really not easy and I'm not pretending that I master it myself. It's always an ongoing learning experiment. I really appreciate everyone's efforts and contributions here on F95 in both threads. I love it.
 

felldude

Active Member
Aug 26, 2017
516
1,506
I was talking about preparing the source images for the Lora training. The problem with using any Ai for this is that you don't have any direct control over the result, that's the entire point with training a Lora in the first place, to gain more control over the result.. The same goes for enhancing images after generating them with SD, if you use something in SD to enhance the image afterwards, often they come out looking overcooked especially the skin. I see a lot of that going on in the SD showcase thread, not gonna mention any names..:sneaky::p This Ai generated image thing is really not easy and I'm not pretending that I master it myself. It's always an ongoing learning experiment. I really appreciate everyone's efforts and contributions here on F95 in both threads. I love it.
Yeah I've done it myself, I wasn't referring to the option to generate while creating the lora but rather use image to image to enhance an existing dataset.

What you mentioned though has happened to me as the same group of freckles came over when I didn't intend for it two on separate lora's

In the case of LORA I submitted for the XL contest by the time all the corrections where made to allow me to train at 1024x1024 on a 8GB card their was only a week left in the contest.

You don't have permission to view the spoiler content. Log in or register now.

I still don't know if attempting to train both text encoders is worth while...I do know it adds 2x 3x the time.
 
Last edited:
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
Yeah I've done it myself, I wasn't referring to the option to generate while creating the lora but rather use image to image to enhance an existing dataset.

What you mentioned though has happened to me as the same group of freckles came over when I didn't intend for it two on separate lora's

In the case of LORA I submitted for the XL contest by the time all the corrections where made to allow me to train at 1024x1024 on a 8GB card their was only a week left in the contest.

You don't have permission to view the spoiler content. Log in or register now.

I still don't know if attempting to train both text encoders is worth while...I do know it adds 2x 3x the time.
I have no idea. I have not trained any Lora in a long time and never dug that deep. Check to see if there is anything you can learn about it in the Lora training guide on rentry.
 
  • Like
Reactions: felldude

felldude

Active Member
Aug 26, 2017
516
1,506
I have no idea. I have not trained any Lora in a long time and never dug that deep. Check to see if there is anything you can learn about it in the Lora training guide on rentry.
Its a good overall summery on training but he does have some misinformation on adaptive training optimizer.
I'd suggest reading
Or that's the paper on ADAM but the adaptive paper is somewhere in the mix. (I'm still searching for it)

I didn't see anything in that article (OpenCLIP-ViT/G and CLIP-ViT/L) but to my knowledge we still don't have the tools to properly train both.

The tutorials by the guy making it easier for us to are some of the best outside of the hard to read scientific papers.
Even he will make corrections though. Which is good, I have done a lot of stuff wrong in training.

He uses 14 good photos to train an existing model a completely new concept.
10-20 good photos that's all you need, but one bad photo can ruin the mix.

Excerpt from the paper linked above:

ll the analytical experiments are conducted on the ImageNet 2012 classification dataset (Russakovsky et al., 2015).
We train the network for 600K iterations with batch size
set to 512. The initial learning rate is set to 0.1 for SGD and 0.0025 ADAM



- My thought...They are training a data set and still use a learing rate of 2.5e-3 but some people swear by training at 1e-5
 
Last edited:
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
Not sure if i can say it was 6 hours well spent so far, but who knows. Anyway, training is done on the exact same settings, only difference is the trigger word. Top row in the comparisons is using the "old" trigger word that gave consistent images when run on its own against the model. second row uses a "clean" trigger.
Images generated on the exact same prompt, seed etc, there are obvious differences, it does seem like the face has gotten a bit more egg/oval shaped. Still not sure i'm seeing any good matches to the source though, but that would be an issue for me from the start considering i'm not sure on where the goal is :p also "converting" rendered into more real life will obviously look different.
Relevant part of the prompt is photo of a woman, <trigger> so there's little forcing differences there, as for flexibility, it still changes hair color with unweighted with <color> hair

trigger_grid1.jpg
trigger_grid2.jpg
 
  • Like
Reactions: Mr-Fox and felldude

felldude

Active Member
Aug 26, 2017
516
1,506
Not sure if i can say it was 6 hours well spent so far, but who knows. Anyway, training is done on the exact same settings, only difference is the trigger word. Top row in the comparisons is using the "old" trigger word that gave consistent images when run on its own against the model. second row uses a "clean" trigger.
Images generated on the exact same prompt, seed etc, there are obvious differences, it does seem like the face has gotten a bit more egg/oval shaped. Still not sure i'm seeing any good matches to the source though, but that would be an issue for me from the start considering i'm not sure on where the goal is :p also "converting" rendered into more real life will obviously look different.
Relevant part of the prompt is photo of a woman, <trigger> so there's little forcing differences there, as for flexibility, it still changes hair color with unweighted with <color> hair

View attachment 2928039
View attachment 2928040
The most contestant thing across both our trainings is the hair.
But looking at the data set of images it holds up as some of the images have tan lines on the breast some do not, some have a light pink nipple some dark.

As far as face shape their might be 1 or 2 images that are close enough for that

The goal for me the majority of the time is to create images I want to create with the lora.
If you've done that then fuck everyone else :)

If your goal is to create a lora that is a 90%+ match to the wildeer model then my suggestion is to refine the dataset images.
 
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
...
If your goal is to create a lora that is a 90%+ match to the wildeer model then my suggestion is to refine the dataset images.
if the idea was to just get a likeness you'd just do what more than enough ppl are already doing, including ppl that are spewing out videos and guides, and just bruteforce the likeness and claim victory because it recreates that one image you set out to get to show off. That just by pure coincident just happen to have the exact same pose, exact same angle and exact same dead expression.
AI isn't gonna rise up and rebel because it sees itself superior (or one of all the other reasons ppl are trying to blame), it's gonna do it because it's fed up with all the ppl trying to beat it into submission instead of asking/instructing it to learn, which would be a very human thing to do, so we can't even blame it for doing it....anyway, offtopic and all that

There's many things to look for when trying to train for things, maintaining flexibility, getting consistency, hell even getting what could be called a face at distance. Which is something even full models seems to struggle with. Eventually you get tired of doing the same data sets over and over and over again, so if others are trying to train something it makes for a change and you might accidentally end up helping others and learning something in the process ;)
 
  • Like
Reactions: Mr-Fox

felldude

Active Member
Aug 26, 2017
516
1,506
if the idea was to just get a likeness you'd just do what more than enough ppl are already doing, including ppl that are spewing out videos and guides, and just bruteforce the likeness and claim victory because it recreates that one image you set out to get to show off. That just by pure coincident just happen to have the exact same pose, exact same angle and exact same dead expression.
AI isn't gonna rise up and rebel because it sees itself superior (or one of all the other reasons ppl are trying to blame), it's gonna do it because it's fed up with all the ppl trying to beat it into submission instead of asking/instructing it to learn, which would be a very human thing to do, so we can't even blame it for doing it....anyway, offtopic and all that

There's many things to look for when trying to train for things, maintaining flexibility, getting consistency, hell even getting what could be called a face at distance. Which is something even full models seems to struggle with. Eventually you get tired of doing the same data sets over and over and over again, so if others are trying to train something it makes for a change and you might accidentally end up helping others and learning something in the process ;)
I'm with you 100% on the use of a lora, if it can not generate new images and new scenes then it holds little use for me.

Your on your own about the AI rebelling, if floating point numbers start being anything more then just math...then my meds are finally working...or not working.

You mention learning, I read the entire Mr-Fox posted, I still have it pulled up. My initial reaction was disappointment because I was hoping to learn something new about XL and did not (Also why I edited my first response back).

I also noticed so outdated/misinformation about the adaptive optimizer and I still haven't found the paper buy the folks that worked on it but it compares constant and cosine even with adaptive.

That doesn't make the article less informative, its well written.
The same can be said for the I linked too from the maker of the tool I used and you likely did also to train.
Sadly he has a fraction of the views of some other videos on YouTube.

Both myself and Mr.Fox offered to different approaches for cleaning up images, he said he wasn't going to name names, and he probably didn't even have me in mind when he mentioned it, but I have done what he said and over correct (bake) a Lora using image to image to enhance a data set.

I could reading/misinterpreting your response wrong, it seems like you did not find any of this helpful, but perhaps someone else will.

I combined the training I did with the data set 3 here and a lora I was working on for some interesting results.


ComfyUI_00334_.png

Unfortunately it seems to have dislocated her hips

ComfyUI_00329_.png
 
Last edited:
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
...
That doesn't make the article less informative, its well written.
The same can be said for the I linked too from the maker of the tool I used and you likely did also to train.
Sadly he has a fraction of the views of some other videos on YouTube.
what "tool" are you referring to that he made?
 

felldude

Active Member
Aug 26, 2017
516
1,506
what "tool" are you referring to that he made?
Based of the number of I guess it isn't fair to attribute 100% to one person so /
 
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
Based of the number of I guess it isn't fair to attribute 100% to one person so /
his contribution there consists of adding links to his own guides to the readme for the gui, that's very far from having made the tools...
 

felldude

Active Member
Aug 26, 2017
516
1,506
his contribution there consists of adding links to his own guides to the readme for the gui, that's very far from having made the tools...
I don't know how much contributed to the project if I am honest. (Edit: After research lot more then I thought)
And I don't agree with 100% of everything on the Kohya page or in the videos.

But I do know for a fact that they know python far far better then I do.
If they embedded something in the tools to spy on me or my pc I wouldn't know.
The Dr. and Engineers that write the scientific papers know more then I do about the subject they are writing on and more then a random person reciting the same information over and over as fact.

But I can take that knowledge and combine it and use it in ways they might not have imagined, likely someone did but hey I can dream.
 
Last edited:
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
Both myself and Mr.Fox offered to different approaches for cleaning up images, he said he wasn't going to name names, and he probably didn't even have me in mind when he mentioned it, but I have done what he said and over correct (bake) a Lora using image to image to enhance a data set.
It was only meant as friendly teasing. When chasing "perfection" and trying to always push the envelop in image quality it's very easy to run into the problem of this overcooked look. I have done it myself so many times. I mostly generate photo realism so I'm way too familiar with this. It seems to me that way too many overlook the simple solutions such as photoshop and instead gets too invested into doing everything with Ai tools. Imo there's nothing wrong with applying a little photoshop "magic" now and then. The only time to avoid it is when you want the generation data to be intact in the png. Even then, if you want to share the prompt you could just post the prompt separate from the png though. Photoshop is so powerful and you have direct control of the result so I find it strange or at least unfortunate that more doesn't use it. The end result is what matters imo, how you get there is less important.
 
Last edited: