Using self-hosted LLMs for erotic text adventures

Affogado

Newbie
Game Developer
Jun 12, 2021
91
149
For around a week I've been experimenting with using locally hosted LLMs to play erotic text adventure games. If you have no idea what this means, basically I install the large language model (think Chat GPT, but uncensored but not quite as robust) on my hard drive instead of using one hosted on a server somewhere.

I'm using but it'd probably be similar with anything similar. Installation is simple, just download it and run the installer; it'll throw a shortcut on your desktop and put the software somewhere in your user directory (for windows) but if you want to move it somewhere else it won't break it. At least it didn't for me!

1728736348343.png

(Please don't ask me for tech support, I don't know what I'm doing.)

Anyway, you also need a model to actually do the work. You can download one right through the app, using that purple magnifying glass icon over on the left. It has some basic help FAQs to describe things way better than I can, so feel free to browse them. You can search for any models at Hugging Face (where they are hosted for download) - I have had good results with Cydonia-22B-v1.1-GGUF and Blue-Orchid-2x7b_GGUF

1728736640533.png

On the search screen you have the search results on the left, including different versions of the model you're looking for, and on the right the details and download link. In particular it'll tell you if it thinks the model is too big for your GPU, or if a partial GPU offload is possible, or if it should be fine. If it says it's too big it'll probably crash if you try to run it, if it says partial (like above) it'll work if you reduce the GPU offload in the settings far enough, but might run slowly.

Again, I don't really know what the fuck I'm doing, so experiment with different models or search on reddit for recommended uncensored models or any other technical questions. This isn't really a tutorial, it's a trip report, so thus ends the technical aspect of this post!

Playing a Game

Mostly I see people using these LLM for erotic chats; they give a prompt asking the LLM to take the role of a sexy babysitter or Rogue the Bat (or Rogue the Mutant!) or whatever the fuck gets them off. There's nothing wrong with that, but I wanted something closer to a text adventure game, so I refined a prompt and came up with this:

Code:
You are presenting a text adventure in the second person, using vivid language, 
detailed description, strongly characterized NPCs, and interesting dialog to depict the 
events of the story in response to the {{User}} input. When sexual events occur, describe 
them in explicit sensory detail.

{{User}} will take on the role of the protagonist.
You will not write their dialog or action.
You can add in any other styling details too, and give the protagonist a name. It absolutely WILL describe what you're doing and saying, but telling it not to makes it happen slightly less. I put this code in the "System Prompt" box in the right sidebar... the problem with these models is that they only look back the last couple thousand tokens (words) and forget everything before that, so anything you want them to remember for the whole chat/game should go in System Prompt.

This is also where you put any instructions for the specific scenario you want to play out.

Setting the Scenario

Code:
Carl has just joined the school wrestling team to discover himself the only male on the team.

There's a new coach this year whose unconventional approach is heavily based in BDSM, using its elements to maintain discipline, 
punish infractions, toughen the students, and create strong bonds between them. She is aroused by dominating her team, though 
she presents it as simple strictness and unconventional teaching methods.
You can absolutely make this more complex, but the longer it is the more tokens it'll use up, and the less short-term memory the LLM has. But since it's going to absolutely forget shit, anything that's important should go here. Be concise.

I like to include details about the characters, so the computer keeps the cast straight. You could include dozens if you want, but I find that small casts and smaller scoped scenarios work better. For our example tutorial, I'll use 5 characters and just draw them from pop culture, cartoons, comics, games, etc.

Code:
Characters:
Coach Carol Danvers, 28, butch, a domme, wants to build a strong team
Laura Kinney, 18, team captain, dominant, likes it rough, tough as hell. She has a girlfriend, Kitty Pryde, who is not on the team
Lara Croft, 18, British, athletic, tough, likes bondage
Buffy Summers, 18, the Slayer, agile, repressed masochistic tendencies
Ellie Williams, 18, lesbian, dating Nico Minoru who is not on the team, submissive
The model I use (Cydonia) does a decent job of recognizing these characters and keeping them in character. (yes, I know how it works, they're just pattern matching and it's not actually any kind of emulation, but the training data on known fictional characters is pretty good). You can absolutely make up original characters, or just trust the LLM to throw random characters at you by not defining any of them, but using established characters has been good for testing because it lets me define a lot of character traits with just a name, saving tokens.

How to Play

The other important detail is your first message to the LLM, because this sort of sets the style you can expect. You have to set the scene, the tone, etc.

1728738740268.png
We're lucky and it gives us something good right away. We can work with this.
1728738928168.png
Now here's something we might want to change. Maybe we want more of a slow burn. Maybe we don't want Carol admitting that she's going to be essentially domming the whole class. We can either edit the computer's response - just cut out the line where she mentions using BDSM - or we can regenerate until we get something that goes with the mood we want to set, and the pacing we want.
1728739147432.png
I like this result better. It's more subtle, but still hints at what's to come. You might also get a response where the Coach starts dominating you right away. Or just edit the LLM's response to make that happen.

Sometimes it sucks

For the most part I like to take a hands-off approach and accept whatever twists the game throws at me, but let's remember what an LLM is. This isn't a game. We've told it that it is, so it tries to give us the next likely text in this context. The characters don't think - they aren't even distinct subroutines or game objects. They're just elements in the text that the model is trying to continue riffing on based on what we tell it.

Sometimes the pace is too fast or slow, or it adds in unnecessary commentary like "Oh! This sounds like an exciting story! Let's see what happens next!" or other goofy shit. Sometimes the responses ramble on and fucking on and it presumes actions or dialog I don't want the player to take... so I need to reign it in. I edit the responses or regenerate them until I get one I like.

Anyway, I have no idea what I'm doing, so feel free to suggest better models or prompt templates. Cydonia-22B is just about the limit of what my GPU can handle at a speed that's bearable.

I have no idea if this is interesting to anyone else but me - but if it is, download LM Studio (or a different framework) and try it yourself. Tell me how it went. Or let me know if you want to see other scenarios I've played with, like:

* Locked down during a sexy pandemic
* Psi Op with a protective squad of rugged yet sexy space marines
* Sexy House Party (of the damned)
* Accidentally assigned to a sexy all-female colony ship oh noooooo
* Your own. Personal. Holodeck. (also it is sexy)
* The Sexy Doctor Mindfuck has invented the bimbofication gun!

Okay some of those are just me riffing right now, but they'd probably work just fine.
 

abyss50055

New Member
Feb 19, 2018
7
6
Welcome to the AI rabbit hole! That's a nice and easy to understand introduction. I'm pretty new to this whole subject myself and I've never used LM Studio. It's seems to be quite easy to use and therefor a good choice for anyone who wants to try out AI RP / chatting for the first time.
I'm mainly using the Silly Tavern frontend ( ) right now. Probably more complicated to set up, but it offers some really nice features. The quality of the rp / conversation greatly depends on the information that the user sends to the model. There are a myriad of guides on how to do this effectively and I'm still in the process of figuring this out. Here's one guide that I found quite helpful: ( )

I really enjoy rp'ing with LLMs, since they allow us to create intricate stories about really niche subjects.
 
  • Like
Reactions: Affogado

Affogado

Newbie
Game Developer
Jun 12, 2021
91
149
I realized that I'll never have enough time to implement all the game ideas I have in Twine or Ren'Py, so "running" them as text scenarios helps me to evaluate whether they'd be worth my time developing into an actual game.
 
  • Like
Reactions: abyss50055

desmosome

Conversation Conqueror
Sep 5, 2018
6,296
14,440
I don't have a strong enough computer for local, so I just use the various online sites. After yodayo imploded, the current best free site is JanitorAI. AFAIK, there are local models with tens of thousands, or even 100k+ context windows out there? That's what I'm most curious about. Like how much memory does the best local models have compared to the usual 4k~8k free token models online? And how well does such a model actually utilize that expanded context window?

Honestly, LLM are not that great at long progressive narrative. They go from 0 to 100 all the time. Well, it's more like there is no subtlety. No gradual change. And of course, their memory is very limited to the recent chat history. But as you play with them more, you get better at guiding them so they actually can produce a somewhat decent narrative. But still, short stories are about what it can handle, mostly due to the fast escalation.

But still, I find myself going back to LLM adventures from time to time because you can craft a short to moderate length porn vignette exactly as you envision it.
 

abyss50055

New Member
Feb 19, 2018
7
6
Couldn't agree more, the 'memory' / context size is very important. I have no idea where the upper limit for powerful local models is. There are a lot of providers on open router ( ) who are offering models with >100k context size, so that should give you an idea. I'm not an expert, but I do know that context size can't just be expanded to enormous numbers for every model.

Lack of subtlety can be really annoying, but it's certainly not unavoidable. Models act / write very differently and some of them tend to be quite horny, but even most of these horny models can usually be wrangled in (to a degree) with characters that are specifically written for a slow paced story, system prompts and adjusting settings / samplers. I don't know to what degree this applies to Janitor AI though. I've only tried out Janitor for a short time and the available settings were rather limited at that time.

If you want to try out some different free smaller models, with access to the settings, then I'd recommend - only disadvantage is you have to bring your own characters. is one place where you can find tons of characters to download.
 

Affogado

Newbie
Game Developer
Jun 12, 2021
91
149
What I've found works for longer stories is breaking them up into chapters or episodes, each with their own situation/context/theme. They're discrete chats within the same general framework; in LM Studio I just update the system prompt with whatever's important from the last episode and the general focus of the current one. Brings a sense of continuity.
 

desmosome

Conversation Conqueror
Sep 5, 2018
6,296
14,440
Oh, I've become kind of an expert at pushing the online LLM bots to it's limits in terms of long form narrative. Yodayo had many features, but what you really just need is a memory bank system. It's named different things on different sites, but basically a window where you can enter in things that will be part of the permanent tokens of the bot as you play.

You can do a lot of things with this, from adding your own spin on the scenario from existing bots, to using it to keep track of key events so that the bot has a general understanding of the story arc so far. Of course, Tokens are always a limited resource, and you can't just keep expanding this permanent memory since it will eat into the bot's context window. So you need to be concise and delete some stuff that becomes no longer relevant or important. This limit is why the bots are best suited in producing a single scene, a short story, or a medium length progression with some time skips. Long, continuous narrative with detailed character arcs and such are a bit much. Can be done, but at some point, the bot really loses grip on things and it's not worth it to keep going.

The bots work best when you and the bot are in sync. It can produce amazing things that align exactly to what you are imagining. That sync is hard to maintain past a certain point. The bot will lose the thread and it becomes increasingly tedious to keep it on track. That's when you probably want to bring the story to a close.

That's just me as a story based player though. Not everyone plays like that. Some just want a character to talk. Some people let the AI lead, mostly reacting rather than guiding it down a narrative.