Hey thanks for the support. I don't think my video card has enough juice to train a lora.. or i'm not doing it properly..
I've watched a few walkthrough videos, but I can't seem to figure it out..
Hey thanks for the support. I don't think my video card has enough juice to train a lora.. or i'm not doing it properly..
I've watched a few walkthrough videos, but I can't seem to figure it out..
. It's not perfect as you don't have total control, but I've used it a few times. Costs 500 'Buzz' each time (even failures), so either get liking some other peoples' images or let me know your ID there and I'll tip you 1000 to let you have a play with it.
There are not that many LCM checkpoints available as of yet compared to "normal" SD1.5 and SDXL, though indeed a few.
You must be registered to see the links
The point of LCM (Latent Consistency Model) is to be able to run fewer steps and lower cfg scale to cut down on generation time but still get a high quality.
The rule of thumb is 6-12 steps and 1-4 cfg scale. 10 steps and 1-2 cfg scale seems to be good with most models.
I ran a few ckpt compare tests with plot script and I borrowed the prompt of the great Devilkkw's delicious Cryptid Babe.
SD1.5 LCM 1024x1280 (notice the tendency for conjoined twins):
SD1.5 LCM 640x960 (notice the absence of conjoined twins):
There are also XL LCM models. As most know you can use a higher resolution with XL models.
The rule of thumb is that the resolution should be equal to 2048. You can try different ratios. One that I have found to work well for me is 896x1152.
(Thanks to the eminent Synalon for providing this list).
SDXL LCM 896X1152:
A tip is to never use the standard VAE that was released with the first SDXL model, it's slow as hell.
I recommend fenrisxl VAE instead, it's faster. SDXL LCM is still much slower in general though compared to normal SD1.5 or LCM, at least with an older GPU like the 1070 card I have.
I hope the excellent Devilkkw doesn't mind I keep posting with his prompt, had too much fun to stop.
The other day me and the eminent Synalon experimented with and explored variation seed, variation strength and more importantly variation resolution.
This is what the tip text says about it:
This means that you can generate an image in landscape without getting eldritch monsters by setting the main resolution in landscape ratio and variation resolution in portrait ratio. See examples below.
I hope the excellent Devilkkw doesn't mind I keep posting with his prompt, had too much fun to stop.
The other day me and the eminent Synalon experimented with and explored variation seed, variation strength and more importantly variation resolution.
This means that you can generate an image in landscape without getting eldritch monsters by setting the main resolution in landscape ratio and the variation resolution in portrait ratio. See examples below.
How you describe the variation resolution sounds similar to how you can use width/height and target width/target height in sdxl text encoder in comfyui. If that's the case it's very useful, since among many things you can "zoom" in/out or generation, and decide what gets "cropped out" and how it fits on your "canvas"
Regarding LCM, you don't have to have a model trained for it, there are LCM weight loras, for both SD15 and XL, which lets you use any model and you can create images at less steps and cfg. You use them as any other lora and they work pretty well.
You must be registered to see the links
You must be registered to see the links
There's a Lora for SDXL Turbo too
You must be registered to see the links
And one that combine both LCM and Turbo
You must be registered to see the links
Edit:
Since i had to go look for the vae Mr-Fox mention (compulsive need to try new things), it wasn't that easy to find right away, too many model version listed, so here's
How you describe the variation resolution sounds similar to how you can use width/height and target width/target height in sdxl text encoder in comfyui. If that's the case it's very useful, since among many things you can "zoom" in/out or generation, and decide what gets "cropped out" and how it fits on your "canvas"
Regarding LCM, you don't have to have a model trained for it, there are LCM weight loras, for both SD15 and XL, which lets you use any model and you can create images at less steps and cfg. You use them as any other lora and they work pretty well.
Edit:
Since i had to go look for the vae Mr-Fox mention (compulsive need to try new things), it wasn't that easy to find right away, too many model version listed, so here's
My bad. I have updated the post with the link. Should not be an issue now that we both have linked to it.
Btw, the FenrisXL model itself is also excellent. Now this can't possibly need it's own link..
That's some mighty fine Wendigo Erotica. A little tip or just observation, the LCM sampler will not give a good result if you are not using it with a LCM checkpoint. Also, if you use a resolution 1024 wih SD1.5 you are more likely to get conjoined twins. I would recommend to use 960x640 and then use either hiresfix or upscale in img2img with SD Upscale script. I know for a fact that you are already aware. This is only a reminder and for anyone else that might not be aware.
Oh, never see lcm models. And never try it, is possible to port standard .safetensors to lcm? benefit?
A good reminder you say Mr-Fox, sd 1.5 model work great with low res, and push it high is really a pain in the ass, many models get double and weird result at 768, so generate and upscale seem good solution.
But i have to ask a question: i use my model merged many times with merge block weight in a111, to push out resolution, but in a1111 max resolution reach 896x1152 and in CUI i reach 1024x1280. why so much different?
I also checked sampling method code, and seem work different in CUI and in a1111, but if the sampler is the same, why?
There are not that many LCM checkpoints available as of yet compared to "normal" SD1.5 and SDXL, though indeed a few.
You must be registered to see the links
The point of LCM (Latent Consistency Model) is to be able to run fewer steps and lower cfg scale to cut down on generation time but still get a high quality.
The rule of thumb is 6-12 steps and 1-4 cfg scale. 10 steps and 1-2 cfg scale seems to be good with most models.
I ran a few ckpt compare tests with plot script and I borrowed the prompt of the great Devilkkw's delicious Cryptid Babe.
SD1.5 LCM 1024x1280 (notice the tendency for conjoined twins):
There are also XL LCM models. As most know you can use a higher resolution with XL models.
The rule of thumb is that the resolution should be equal to 2048. You can try different ratios. One that I have found to work well for me is 896x1152.
A tip is to never use the standard VAE that was released with the first SDXL model, it's slow as hell.
I recommend fenrisxl VAE instead, it's faster. SDXL LCM is till much slower in general though compared to normal SD1.5 or LCM, at least with an older GPU like the 1070 card I have.
I hope the excellent Devilkkw doesn't mind I keep posting with his prompt, had too much fun to stop.
The other day me and the eminent Synalon experimented with and explored variation seed, variation strength and more importantly variation resolution.
This means that you can generate an image in landscape without getting eldritch monsters by setting the main resolution in landscape ratio and variation resolution in portrait ratio. See examples below.
So beautiful result's. i'm glade you used my prompt for sample. And good test, variation seed is bit underestimate,keep testing and share. I'm really interested on it.
Hey thanks for the support. I don't think my video card has enough juice to train a lora.. or i'm not doing it properly..
I've watched a few walkthrough videos, but I can't seem to figure it out..
Too general, what are your pc spec? what are you using for train? video driver version?
Also read some post ahead, other user give you a suggestion on a possible alternative way to train.
I have no idea.
The benifit is like I said in my post to be able to use much less steps and lower cfg scale to cut down on generation time and still get high quality.
A good reminder you say Mr-Fox, sd 1.5 model work great with low res, and push it high is really a pain in the ass, many models get double and weird result at 768, so generate and upscale seem good solution.
But i have to ask a question: i use my model merged many times with merge block weight in a111, to push out resolution, but in a1111 max resolution reach 896x1152 and in CUI i reach 1024x1280. why so much different?
I also checked sampling method code, and seem work different in CUI and in a1111, but if the sampler is the same, why?
As I don't use the spagetti ui I can't help you with ComfyUi.
With SD1.5 it's best to keep it under 1024 in either direction. So I use 640x960 for portrait ratio, I simply flip it when I do landscape. I use this resolution while searching for a good seed and then when I found it
I then re-use that seed and enable hiresfix with 2x upscale and a fairly low denoising to get a sharp image. Then I might upscale it further with SD upscale script in img2img.
So beautiful result's. i'm glade you used my prompt for sample. And good test, variation seed is bit underestimate,keep testing and share. I'm really interested on it.
I'm going to test training sdxl on pornographic concept by using color association to ease the formation of the neural network. I don't know anything about that but I assume it creates associations so it should work.
Essentially, I'll separate the image into two identical images, then color specific regions. Then I'll prompt what the colors are associated to. The AI knows what the colors are, so it will associate it with the concept to learn.
I'm going to test training sdxl on pornographic concept by using color association to ease the formation of the neural network. I don't know anything about that but I assume it creates associations so it should work.
Essentially, I'll separate the image into two identical images, then color specific regions. Then I'll prompt what the colors are associated to. The AI knows what the colors are, so it will associate it with the concept to learn.
The most important aspect is the amount of Vram. The card needs to have at the absolute minimum of 4GB for generating images. The chip itself can't be slow as snails either. There are many settings and small things you can do if you suffer from low Vram.
You can add the argument "--low vram" to the "webui-user.bat" file. In the ui itself you can set different settings in the "lightest" mode. Then keeping the resolution low when generating and low amount of steps etc to begin with until you know what you can achieve with your card. Start with 512x512 or in portrait mode (2:3) you can go below 512, such as 344x512. Then use SD Upscale Script in img2img to make the image larger. Since this is using "tiling" you can upscale by 4x. This will get you to 1376x2048. Keep it to an easier style or genre such as anime or manga. With easier I only mean in terms of hardware requirement.
If your card can't hack it then google colab or Stable pod type services might be the option for you. It's online server based image generation. On some sites you can rent a high end cards by the hour. This means there is nothing stopping you from training your own models etc. As long as you are willing to pay the hourly fee.
This is also a challenge for anyone who wants to participate.
The point of the challenge is to be more creative with the prompt and come up with new innovative solutions within specified
limitations and without the usual toys. The basic idea is to emulate the challenge people have with an old weak GPU.
This is why we keep the resolution low and avoid using a bunch of extension in txt2img.
It's meant to be a learning exercise first and foremost and no competition.
No one will lynch you if you take small liberties but it's more fun if everyone try to stick to the "script".
The limitations:
- In txt2img.
Use low res 344x512 or 512x512.
No controlnet or after detailer etc and no roop or reactor.
Postprocessing is allowed.
Face restore is allowed if you really want to use it.
keep the prompt simple and under 90 tokens and no more than 2 loras or embeddings in total, preferably none.
You can choose any genre and concept, nude or SFW.
- In img2img.
you are free to use inpaint as much as you wish and after detailer in the interest of fixing hands or deformed details etc.
Maybe I'm wrong but I think it's less memory demanding when you already have an image to work with.
keep the prompt in after detailer somewhat simple also.
The same limit of loras and/or embedding (2) for after detailer as txt2img.
no controlnet, roop or reactor.
Use SD Upscaler Script with any upscaler you want and 2-4x to finalize the image.
Post both the image from txt2img and the final image from img2img so we can see the prompt and process.
Give a short description outlining the process and the general concept.
Also share any thoughts or reflections about things you might have discovered and learned.
The challenge will continue as long as someone is still interested.
Remember to have fun.
-------------------------------------------------------------------------------------------------------------------------------------------
In txt2img:
I will expand the prompt a little from before and see what I can achieve within these limits. I avoid using any extensions that will add to the memory demand, such as controlnet or after detailer etc in txt2img. I only use postprocessing GFPGAN as I don't think it is very demanding.
In img2img:
I use after detailer for fixes and enhancing.
Lately I have experimented with using an alternative ckpt for after detailer with very interesting results.
I had to fix a tiny detail on the thumbs fingernail with photoshop for the first image.
A little "cheating" has never hurt anyone has it?..
Then I turn off postprocessing GFPGAN and all model in after detailer with the exception of eyes before upscaling.
I upscale with SD Upscale Script with UltraSharp at 4x to finalize my image.
This is also a challenge for anyone who wants to participate.
The point of the challenge is to be more creative with the prompt and come up with new innovative solutions within specified
limitations and without the usual toys. The basic idea is to emulate the challenge people have with an old weak GPU.
This is why we keep the resolution low and avoid using a bunch of extension in txt2img.
It's meant to be a learning exercise first and foremost and no competition.
No one will lynch you if you take small liberties but it's more fun if everyone try to stick to the "script".
The limitations:
- In txt2img.
Use low res 344x512 or 512x512.
No controlnet or after detailer etc and no roop or reactor.
Postprocessing is allowed.
Face restore is allowed if you really want to use it.
keep the prompt simple and under 90 tokens and no more than 2 loras or embeddings in total, preferably none.
You can choose any genre and concept, nude or SFW.
- In img2img.
you are free to use inpaint as much as you wish and after detailer in the interest of fixing hands or deformed details etc.
Maybe I'm wrong but I think it's less memory demanding when you already have an image to work with.
keep the prompt in after detailer somewhat simple also.
The same limit of loras and/or embedding (2) for after detailer as txt2img.
no controlnet, roop or reactor.
Use SD Upscaler Script with any upscaler you want and 2-4x to finalize the image.
Post both the image from txt2img and the final image from img2img so we can see the prompt and process.
Give a short description outlining the process and the general concept.
Also share any thoughts or reflections about things you might have discovered and learned.
The challenge will continue as long as someone is still interested.
Remember to have fun.
-------------------------------------------------------------------------------------------------------------------------------------------
In txt2img:
I will expand the prompt a little from before and see what I can achieve within these limits. I avoid using any extensions that will add to the memory demand, such as controlnet or after detailer etc in txt2img. I only use postprocessing GFPGAN as I don't think it is very demanding.
In img2img:
I use after detailer for fixes and enhancing.
Lately I have experimented with using an alternative ckpt for after detailer with very interesting results.
I had to fix a tiny detail on the thumbs fingernail with photoshop for the first image.
A little "cheating" has never hurt anyone has it?..
Then I turn off postprocessing GFPGAN and all model in after detailer with the exception of eyes before upscaling.
I upscale with SD Upscale Script with UltraSharp at 4x to finalize my image.
Upscaled 2x using 4xNMKDSuperscale to 720x1072
Tiny bit of extra GFPGAN (0.01)
No other post-processing.
The vRAM could almost undoubtedly be reduced further using '--low vram'
Could I do better with a bit more time? Probably! But yeah, like some other things in life, it's not how big it is, it's what you do with it that counts.
Upscaled 2x using 4xNMKDSuperscale to 720x1072
Tiny bit of extra GFPGAN (0.01)
No other post-processing. View attachment 3271321
The vRAM could almost undoubtedly be reduced further using '--low vram'
Could I do better with a bit more time? Probably! But yeah, like some other things in life, it's not how big it is, it's what you do with it that counts.
Yes. Excellent. I love it.
The "rules" are not hacked into stone, more like guidelines. The interesting part is to see what you guys can come up with without relying on memory demanding extensions and keeping it low resolution and trying to be creative and inventive. I had the idea when trying to give advice to lobotomist the guy with an intel card. What would it be like? And what could we achieve with those kind of limitations?
The most important aspect is the amount of Vram. The card needs to have at the absolute minimum of 4GB for generating images. The chip itself can't be slow as snails either. There are many settings and small things you can do if you suffer from low Vram.
You can add the argument "--low vram" to the "webui-user.bat" file. In the ui itself you can set different settings in the "lightest" mode. Then keeping the resolution low when generating and low amount of steps etc to begin with until you know what you can achieve with your card. Start with 512x512 or in portrait mode (2:3) you can go below 512, such as 344x512. Then use SD Upscale Script in img2img to make the image larger. Since this is using "tiling" you can upscale by 4x. This will get you to 1376x2048. Keep it to an easier style or genre such as anime or manga. With easier I only mean in terms of hardware requirement.
If your card can't hack it then google colab or Stable pod type services might be the option for you. It's online server based image generation. On some sites you can rent a high end cards by the hour. This means there is nothing stopping you from training your own models etc. As long as you are willing to pay the hourly fee.
No. I made the post for you. I generated example images so you can see what you might be able to do if you have at least 4GB of Vram on your intel card. If your card can't do SD then there are online sites that let you use their computer and you can rent a high end graphics card. "Inspired" by this I even started a challenge to create images with your scenario in mind.
No. I made the post for you. I generated example images so you can see what you might be able to do if you have at least 4GB of Vram on your intel card. If your card can't do SD then there are online sites that let you use their computer and you can rent a high end graphics card. "Inspired" by this I even started a challenge to create images with your scenario in mind.
even the cheapest intel card that nobody buys has 4gb... you couldn't even take a second to google vram on intel cards before writing a huge wall of text? thanks i guess..
i have 8gb on my a750 which is pretty capable of doing stable diffusion, my question was mostly because I don't know if all those plugins like comfy ui are nvidia only.
Oh also the other most common intel card the a770 has 16gb of vram
even the cheapest intel card that nobody buys has 4gb...
i have 8gb on my a750 which is pretty capable of doing stable diffusion, my question was mostly because I don't know if all those plugins like comfy ui are nvidia only.
Oh also the other most common intel card the a770 has 16gb of vram
Why so pissy when people are just trying to be helpful? It's not any ones job to help you but we do it anyways.
Don't expect anyone to fall over themselves to answer you in the future if this is how you respond.