It's an extremely interesting concept with a butt-load of potential, especially if you can get some of the erotic literature & erotic audio community involved. I see a few hurdles (/opportunities) that you might want to consider:
1 - Content. There are billions of people out there, each with varying tastes and kink-tolerance. This is great for the modular approach--you could have episodes that cater to specific proclivities, you could have vanilla airport-romance-style episodes as a more 'entry-level' thing, and you can even throw out content for more extreme fetishes to hit that lucrative deep-niche market. But that still begs the question: if you're releasing it all in one program/download, there's the risk that content that's too specific might scare away first-timers (especially since your game requires spouses whose spouses may not be as into digital wankery as your actual downloaders), or that content that's too vanilla might bore a primary audience who are old-hat veterans video-game-smut (and likely, in flashback parlance, have seen some shit (and kinda liked it)).
2 - Format. Running everything off one program running on one system has some limitations (IE you can't have stereo-quality sound, since they have to split earbuds, you can't have scenarios involving a third party, etc), which isn't the end of the world, but it does limit immersion. That said, running it from a program (instead of, say, just two separate audio files) does have some potential for some wonderful features, like integrating lovesense toys, or syncing two instances to work in tandem on two devices (so both parties can have that sweet sweet stereo). However, since all the scenarios are basically hard-coded into the game, it also doesn't allow any customization or community-created content, which could be a massive missed-opportunity (especially given how active communities like r/GoneWildAudio seem to be).
3 - Talent pool. I'm not sure what your plans are on this front, but there's a lot of quality 'amateur' talent out there, both for actors and writers, and it would be very, very easy to pigeonhole yourself by sticking to a small pool. Hell, I imagine there would be a benefit to having multiple talent choices for the same scenario--IE, for your demo, you have a woman voicing both parts, but a woman might find it more immersive to have a male talking her through, or either partner might perefer a woman with a different tone/voice. Again, lots of potential out there; it's just a question of how you approach it.
4 - Content format (but not really format or content as above). By this I mean "Roughly outlining a scenario and giving action commands to a couple". I recognize that this is the core concept, but it's something that could probably pretty easily be played-with, especially if you find a way to give it stereo-support: you could introduce blindfolds and noise-cancellation to make a super immersive storytelling experience (with the other partner getting secret commands to interact with the blindfolded one), or even have both partners experiencing completely different narratives whose physical interactions happen to line up. Heck, you could even just have sessions that are all story (leaving them with more freedom to interact as they please) or all direction (making it more like a devil on your shoulder telling you what to do vibe). Boundless (and bounded?) potential.
The TL/DR would be to consider opening that fucker up to the sea of talent out there, either by commissioning writers and voice talent to go hog-wild, or by knocking down a few barriers to let these communities play in your sandbox.
All in all, I'd say this is a great concept with a mountain of potential, and I'm curious to see where it goes.