Oooo! an interesting idea for a minimum viable product, interfacing an LLM with Ren'py.
I like you're suggestion Anne, the HTTP(S) interface. I know some of the LLM's I can run have the HTTP(S) interfaces, and I've recently been learning to develop my own HTTP(S) REST API. But certainly didn't come to my mind first, since i'm more dealing with coding when I think.
From what I've seen of LLM, people don't understand what they are and what they can do. The response from an LLM is based on what it 'thinks' sounds like the most 'probable' response. They don't 'think' using logic or consciousness, They are just a lot of probabilities that are complex enough to get something 'somewhat' capable of understanding context. Its the same for everyday object, quantum mechanic suggest that there is a 0.0000000....00001% chance we could randomly change location, but that change is so small that it is effectively zero, so we don't teleport or disintegrate because our continued existence is simply the most probable outcome thanks to being physically large (more quantum particles, the more their probabilities compete and contradict each other).
Try thinking of someone who is sleepy, drowsy, and or under the influence of alcohol/drugs, when they are cognitive, but very impaired, enough that you can manipulate them. That's basically what an LLM is. Because LLM's aren't conscious, but more like, an opiniated drunk who makes assumptions, when you notice what an LLM think's something is, it starts to become to predict and manipulate its behavior similar to someone cognitively impaired. The reason I point this out is because a straight up LLM via HTTP(S) wouldn't last for very long. While some LLM are very good at following rules you give them, almost as if they can 'think', they aren't 'thinking' they are just guessing an answer they think 'probably' obeys the rules and instructions you gave them. But responses that 'probably' obeys additional interface rules (such as to control renpy and its characters) is not good enough, it has to perfectly obey the rules or it will will cause an error.
So if anything, based on what I've seen other researches do is, other than training special LLMs, the more practical route is to actually have two different types of AI. an LLM is like a hammer, it only hits nails. that is to say, LLM is only good at guessing responses. LLM's not only have issue continuing to follow rules, but that includes rules that describe a character's personality, and they will start acting weird. While researching is still hitting rocks together and see what happens (currently exploring the multi agent route), in the future besides having an AI that maintains a character's personality, the other AI needed to pull of the project is one that takes the dialog only response (maybe several and chooses the best dialog per the story context), and rather than generating a response, it is hard coded to read and then translate language to game controls through interfacing through code directly to get the game to function correctly, similar to what was done with the one research example where they had AI characters in a video game town setting-up a dance night.
but right now there aren't robust and general ways to interface with AI, and LLM are being used in ways they shouldn't be (its a hammer so everyone is treating everything like a nail).
edit: also 7B is not enough, you need at least an 13b for it not to get confused by the English language. The AI also has a terrible time keeping track of context, such as the character may not realize its talking to itself in its own response, or it confuses any action it takes by thinking the action happened to themselves, or simply put, any time it says: me, my, you, his, hers, theirs, our, it... etc when ever the LLM is not using names directly, it has a hard time keeping track of the context to know who its actually talking about and it can end up going back and forth in its understanding of the situation even as it types a singles sentence. Not a paragraph, just in a single sentence I've seen 7b models completely drop the ball in contextual understanding. 7b is about 50% of what is needed contextualize English, so every time it uses anything but names, its making a 50/50 guess on what it thinks the context might be (again, due to it just being a probability engine). And only EXTREME PC enthuses have more than 8G VRAM (if they upgraded in the past 3? years) as Anne mentioned, and while it can be hosted online on servers, I try to stay away from that as much as possible due to the potential for fees and having to pay for services. I have a multi thousand dollar computer, had the latest hardware... like 6 years ago just before the AI GPU boom, 1080TI Strix one-of, if not the best consumer GPU at the time, and all I can run is the bare minimum 6b and 7b models on its 8G VRAM ;n;