I find SDXL to be better with composition, but it seems to be the only thing it's better at. Except maybe when the scene concept is pornographic, then it doesn't composite a good image.
So, I usually start with SDXL to create a basic image, then I switch to 1.5 to add details and lewd stuff. Doing an img2img pass on nudity in SDXL also is of no use.
SDXL is almost useless basically. Controlnet XL and intensive pornographic training should make it a lot better tho.
It takes 3 or 4 seconds to generate and upscale a 512x512 and feed it into SDXL for noise, but I have done both.
You can do much higher native resolution renders in XL, I've done 1600x1280 with no issue...unless you try to feed that back in a SD checkpoint, It goes from 1.5sec/it to on XL to 70sec/per IT on the SD....good ol FP32 and BF32 compared to FP16 and BF16
But if anything explicit is in the image XL is out without training as you said, and I am lacking about 4GB of VRAM to train XL
So it adds about 30 seconds for me to loop in XLSD, one of these images has been run with it and the other run without.