Each combination is fully rendered? Yikes. That sounds like it could get out of hand fast even with the very careful/conservative outfit inclusion. Have you considered rendering it as multiple layers?
For example: let's say you have the naked (/bald? if you're planning hats/hairstyles etc.) MC as the base layer (Picture Index X). Then you could have /just/ his chastity cage (or groin-area in general since getting objects on their own in 3D could be tough) as Index X+1. Then his underwear as Index+2, Shirt as Index+3, then gag/face-accessory/etc. as X+4, etc.
It would look the same to the viewer (they would all animate/run on top of one another in the animation cycle) but might make it easier to add/adjust clothes in a more modular way. (Eg. if a patron requests rubber shorts, you just have to render "waist with rubber shorts" and not worry about every top/accessory/etc. combination).
This should be trivial if whatever program you're using to animate/output supports masking. If it doesn't, masking the output in Photoshop is also easy (just import the frames, drag a box around the waist or wherever, and re-save), but admittedly adds a tedious extra step to everything.
To clarify visually:
Sorry to throw a suggestion that boils down to "maybe redo your entire image system?" at you, just figure it's something best considered sooner rather than later, especially if you're concerned about file-size as the game gets more content. I actually hope I've misunderstood you since the number of combinations already sounds agonizing even with the current amount of attire.
Anyway, thanks for your involved reponses!