There are a bunch of different models on Civit AI and some you can run locally even on a decent RTX rig i.e. 3080+ VRAM is super critical to running some of the local models.
Z-Image Turbo (ZIT) has a bunch of great checkpoint models to use for this and they range from realistic and anime... there are also LoRa's that you can add to some of these that fix things like penis size, etc.
there are also a bunch of SDXL based models like Pony, Illustrious, etc. that have a ton of amazing LoRa's to get what you want
Search the site and you would probably want to run ComfyUI or some other tool to run the image generation... there are a ton of great youtube videos to help walk you though this all
It does take some time to start getting good at prompting in a way each check point model needs it to but look at other people's samples and build off the ones you like. The better your GPU and VRAM the faster and better images you can generate. this includes videos with WAN or QWEN... i think there may be other, better i2v (image to video) models that are lighter and not censored so you can easily generate videos like that one.