Well, you have to also give it a prompt. But the image you give it is a prompt in itself, the AI will understand quite a lot of it.
You can just try it yourself, there are free bots on the unstable diffusion and stable diffusion discords to generate images for you. Unstable diffusion is a community for NSFW generations. Or you can install it on your computer, it requires minimally a graphics card like a GTX 1660.
Currently, the stable diffusion AI trains on rather small images. It causes it to have difficulty with small details like skin pores, textures, etc. So that's one of the downside. The other downside is that you can't really pose the same characters in multiple images, because each now generation will be a new random person. That is unless you use the name of a known actor in your prompt, or if you train the model on a particular face yourself.
The real advantage other than the quickness of the process is that the images don't look like obvious 3d renders. The lightning looks real, photographic. You can get close to realistic photographic renders with 3d, but it's extremely hard. This on the other hand is easy as fuck.