RenPy to a local llm

Alvidas

Newbie
Jul 2, 2018
35
15
I wonder if anyone tried to connect RenPy to a local running llm, or if it can be done?
 

anne O'nymous

I'm not grumpy, I'm just coded that way.
Modder
Donor
Respected User
Jun 10, 2017
10,957
16,191
Well, it mostly depend on the software you want to interface with. But if it have a HTTP(S) interface, or more globally a socket interface (whatever Ethernet or UNIX), it's perfectly possible to use it.

Just keep in mind that it will not be suitable for a game, because you've no guaranty that the player will have a computer powerful enough for the LLM software.
 

Alvidas

Newbie
Jul 2, 2018
35
15
I was thinking about any low end 7b model that can run via web ui, on a regular machine. That requires 6gb vram. Did anyone try it, any examples available?
 

anne O'nymous

I'm not grumpy, I'm just coded that way.
Modder
Donor
Respected User
Jun 10, 2017
10,957
16,191
That requires 6gb vram.
You have great hopes regarding what computer the average player have and what they can do. Expect half of them to have at most 2GB of free RAM, and a single hard drive with a high fragmentation level, and a low speed.


Did anyone try it, any examples available?
You said that you're thinking about a LLM with a web UI...

If you still need to ask those two questions after my "if it have a HTTP(S) interface, or more globally a socket interface (whatever Ethernet or UNIX), it's perfectly possible to use it", are you sure that you have the knowledge needed to use a LLM in a game ?
Not that it's a shame to not have this knowledge, but if you can't see that my answer already tell you everything you need to know, you'll have a hard time to operate the LLM and interface it with your game.
 

Alvidas

Newbie
Jul 2, 2018
35
15
Life is a learning process. Yes, you mentioned the HTTP, and that theoretically can be done, and I want to know if anyone has done it, if there are there any examples, because I didn’t find any. I learn from examples. Or at least I’m trying to…
 

anne O'nymous

I'm not grumpy, I'm just coded that way.
Modder
Donor
Respected User
Jun 10, 2017
10,957
16,191
Life is a learning process.
And one starts by learning how to walk before he try to be a 100m Olympic champion...

So, starts by learning how to operate sockets, then how to handle HTTP queries through them. And later, when you'll know that, you'll come to interfacing a LLM server with Ren'Py.
 

Saki_Sliz

Well-Known Member
May 3, 2018
1,403
1,011
Oooo! an interesting idea for a minimum viable product, interfacing an LLM with Ren'py.

I like you're suggestion Anne, the HTTP(S) interface. I know some of the LLM's I can run have the HTTP(S) interfaces, and I've recently been learning to develop my own HTTP(S) REST API. But certainly didn't come to my mind first, since i'm more dealing with coding when I think.

From what I've seen of LLM, people don't understand what they are and what they can do. The response from an LLM is based on what it 'thinks' sounds like the most 'probable' response. They don't 'think' using logic or consciousness, They are just a lot of probabilities that are complex enough to get something 'somewhat' capable of understanding context. Its the same for everyday object, quantum mechanic suggest that there is a 0.0000000....00001% chance we could randomly change location, but that change is so small that it is effectively zero, so we don't teleport or disintegrate because our continued existence is simply the most probable outcome thanks to being physically large (more quantum particles, the more their probabilities compete and contradict each other).

Try thinking of someone who is sleepy, drowsy, and or under the influence of alcohol/drugs, when they are cognitive, but very impaired, enough that you can manipulate them. That's basically what an LLM is. Because LLM's aren't conscious, but more like, an opiniated drunk who makes assumptions, when you notice what an LLM think's something is, it starts to become to predict and manipulate its behavior similar to someone cognitively impaired. The reason I point this out is because a straight up LLM via HTTP(S) wouldn't last for very long. While some LLM are very good at following rules you give them, almost as if they can 'think', they aren't 'thinking' they are just guessing an answer they think 'probably' obeys the rules and instructions you gave them. But responses that 'probably' obeys additional interface rules (such as to control renpy and its characters) is not good enough, it has to perfectly obey the rules or it will will cause an error.

So if anything, based on what I've seen other researches do is, other than training special LLMs, the more practical route is to actually have two different types of AI. an LLM is like a hammer, it only hits nails. that is to say, LLM is only good at guessing responses. LLM's not only have issue continuing to follow rules, but that includes rules that describe a character's personality, and they will start acting weird. While researching is still hitting rocks together and see what happens (currently exploring the multi agent route), in the future besides having an AI that maintains a character's personality, the other AI needed to pull of the project is one that takes the dialog only response (maybe several and chooses the best dialog per the story context), and rather than generating a response, it is hard coded to read and then translate language to game controls through interfacing through code directly to get the game to function correctly, similar to what was done with the one research example where they had AI characters in a video game town setting-up a dance night.

but right now there aren't robust and general ways to interface with AI, and LLM are being used in ways they shouldn't be (its a hammer so everyone is treating everything like a nail).

edit: also 7B is not enough, you need at least an 13b for it not to get confused by the English language. The AI also has a terrible time keeping track of context, such as the character may not realize its talking to itself in its own response, or it confuses any action it takes by thinking the action happened to themselves, or simply put, any time it says: me, my, you, his, hers, theirs, our, it... etc when ever the LLM is not using names directly, it has a hard time keeping track of the context to know who its actually talking about and it can end up going back and forth in its understanding of the situation even as it types a singles sentence. Not a paragraph, just in a single sentence I've seen 7b models completely drop the ball in contextual understanding. 7b is about 50% of what is needed contextualize English, so every time it uses anything but names, its making a 50/50 guess on what it thinks the context might be (again, due to it just being a probability engine). And only EXTREME PC enthuses have more than 8G VRAM (if they upgraded in the past 3? years) as Anne mentioned, and while it can be hosted online on servers, I try to stay away from that as much as possible due to the potential for fees and having to pay for services. I have a multi thousand dollar computer, had the latest hardware... like 6 years ago just before the AI GPU boom, 1080TI Strix one-of, if not the best consumer GPU at the time, and all I can run is the bare minimum 6b and 7b models on its 8G VRAM ;n;
 
Last edited:
  • Like
Reactions: anne O'nymous

anne O'nymous

I'm not grumpy, I'm just coded that way.
Modder
Donor
Respected User
Jun 10, 2017
10,957
16,191
I like you're suggestion Anne, the HTTP(S) interface. I know some of the LLM's I can run have the HTTP(S) interfaces, and I've recently been learning to develop my own HTTP(S) REST API. But certainly didn't come to my mind first, since i'm more dealing with coding when I think.
Well, I would more go for a direct socket interface without all the HTTP level, but I have to adapt with my time. Light free LLM are more likely to have a small HTTP server as front end nowadays, than relying on their own protocol.


From what I've seen of LLM, people don't understand what they are and what they can do. The response from an LLM is based on what it 'thinks' sounds like the most 'probable' response. They don't 'think' using logic or consciousness, They are just a lot of probabilities that are complex enough to get something 'somewhat' capable of understanding context.
Yeah, they are the link between decision trees and effective AIs. Less strict than the first one, but also less "imaginative" than the second.


But responses that 'probably' obeys additional interface rules (such as to control renpy and its characters) is not good enough, it has to perfectly obey the rules or it will will cause an error.
I disagree with you here, at the opposite it's perfect. Due to their limits, they would always be more suitable than current AIs, and probably next gen ones too.

The example is extreme, but if the MC pursue is mother, but haven't kissed her yet, you'll know that a LLM will answer either "not yet" or "let's kiss". The range is predictable, but it will stay a range ; what answer she'll give will depend on the other factors.
This while an AI will have a wilder range of answers, but can also go wild and possibly pick "go out of the house, I don't want to see you anymore" or "okay, you can fuck my ass". Both would be mood killing, because not effectively fitting the context.
 

Saki_Sliz

Well-Known Member
May 3, 2018
1,403
1,011
I disagree with you here, at the opposite it's perfect. Due to their limits, they would always be more suitable than current AIs, and probably next gen ones too.

The example is extreme, but if the MC pursue is mother, but haven't kissed her yet, you'll know that a LLM will answer either "not yet" or "let's kiss". The range is predictable, but it will stay a range ; what answer she'll give will depend on the other factors.
This while an AI will have a wilder range of answers, but can also go wild and possibly pick "go out of the house, I don't want to see you anymore" or "okay, you can fuck my ass". Both would be mood killing, because not effectively fitting the context.
I think with my examples, the in ability for the AI to obey interface rules, I was thinking more genralistic, IE having the AI control the game, not just generate response.

I was reflecting on my experience see many users develop and share chat bots, often give the chat bots rules, such as simulating RPG game mechanics, or instructing the AI to not only talk, but also give an inner monolog following specific markdown rules... only for the AI to forget its trying to simulate game mechanics. In my example I assume the new novice dev would do as I said, use LLM as a hammer and try to solve too many things with it, such as trying to get the LLM to control the game.

But in your example, where the AI just generates responses when given contextual clues (maybe a list of stats, maybe a log of previous or typical conversations with MC), that would be one way of implementing the LLM, probably the most realistic, most practical, and would be a good Minimum Viable Product implementation.

the next most complex adaptation would be for the system to be able to identify what key conversations to store into memory (ie a reflection mechanic that is still being researched for GPT-4), so that conversations can still feel connected between major events... the challenge is identifying such important information, without breaking the personality prediction model. IE the common issue with trying to expand the token count of the AI. if you continue to act like a perv and the AI starts to positively responds (such as saying yes to kissing) it will soon start automatically being receptive and positive rather than matching the character's personality and context information, because following the story/roleplay/context gets higher probability than matching models (the AI can't think in terms of model). So you have to limit the AI to just context data to get responses that better matches context and personality. in fact, it shouldn't even get to choose, at the current state of AI I would say that you should tell the AI what to say, and have the AI just generate nicer words.

I wanted to show a 'working' example of this... but the current AI I am running was trained to respond in a roleplay way... so trying to set up context and instruction just makes it roleplay as a fucking computer!
You don't have permission to view the spoiler content. Log in or register now.
But this did show something interesting. there are different ways of setting up characters for LLM's, some closer to JSON or other code like ways. ways that are more concise and less specific than what I did, but the point is that I wanted to generate 'dialog' and that is not what I got. but again, the AI I am running is for roleplaying so it behaves 'ok' when it is roleplaying and not being instructed. simpler prompts may be better. I know when generating AI art, key words work better. IE for some reason the AI I use for art likes to associate 'goth' or 'emo' with 'purple' and any time it tries to make a character goth, it just makes them wear a lot of purple or purple make up. basically I don't know a good way to get AI to behave well yet :p
 

Alvidas

Newbie
Jul 2, 2018
35
15
We are now at the experimental level. The jump from 7b to 13b makes a big difference indeed, models are less likely to hallucinate, but hardware demands are 12gb vram for that. I shudder to think about bigger models requiring much more vram. 2x3090 for a 70b? Steps were made in running it on cpu and unloading layers on gpu with llama.cpp, at the cost of speed, we are not there yet but things are moving forward In the right direction.

If I understand correctly, I think the biggest issue right now in having two separate systems is continuity, once you sever the connection and the conversation ends, the Ai forgets everything. Loading variables only for key events? How would you load previous conversations in order to maintain coherence when smaller models accept a limited number of tokens before the dreaded “out of memory” message? Even chat gpt limits that and they run it on much bigger hardware.
 

Saki_Sliz

Well-Known Member
May 3, 2018
1,403
1,011
How would you load previous conversations in order to maintain coherence when smaller models accept a limited number of tokens before the dreaded “out of memory” message? Even chat gpt limits that and they run it on much bigger hardware.
Saving and loading 'conversations' is sort of what I meant by "researching is still hitting rocks together"
most of the information we can pass to the AI can be in the form of text, which can be both a good thing and a bad thing.

In my failed little experiment, what I was attempting to do was describe the scene using 'structure.' The idea being, if you know the different ways you 'want' the story to go and flow, (such as, you already made art assets and you just need the AI to generate the text to match), then you can actually 'parse' out all the game structures or events to pre made 'prompts' to encourage certain results. This also means the structure of the prompt is able to include past contextual data specific to the scenario, greatly reducing the amount of text needed. While in my test I didn't have a catalog of examples, if your game is structured to follow a particular flow, then you the programmer can know what conversations to save, keep track of, and load as needed into programmatically generated prompts.

the reason I describe 'conversations' as 'rocks' (ie caveman smashing rocks) is that 'conversations' is just the current datatype easily available, but its not the best or final datatype. I believe with ChatGPT4 on of the features they were experimenting with is that it is able to summarize conversations, and it saves and remembers the summarization. The next step is to be able to unload and reload these back into the token buffer, but that requires a yet to be determined method of tagging these 'summaries' with association tags (similar to how human memory works), or if the process (due to being limited to just text data) really improves anything since a catalog of tags = about the same amount of token as just loading the summary.

what I've wanted to experiment for a while (the frame work is still a prototype), is break down the AI learning process into two parts. make the first model using 'generalized neurons' where these generalized neurons are trained to responded to 'niche inputs' such as a neuron representing a mood or emotion. the second model would look to connect general neurons to each other to get behaviors. Generalized neurons are still a way off, its not something we can design yet, but we are able to discover them, some AI models to end up creating internal generalized neurons. I've been trying to cheat this using a more traditional programmatical approach, as well as keeping my data simple (focusing on adjective training).

However what I am doing is more akin to specialty engineering that's slow and expensive and offers no immediate benefit to anyone, mean while existing LLM's are here, a starting foundation, and 'good enough.' And 'good enough' always win in the end (the history of every technology device is the S curve that describes this phenomenon). Maybe we'll solve things the way we always do, wait until better hardware offers better support, and for software to become more/less efficient. but beyond that its out of my scope.
 
  • Like
Reactions: Alvidas

DiviDreamer

Member
Aug 29, 2020
271
247
I done that, it's actually very simple
fastest model i can find is 'Phi-2 3b q4 k m' it's small and take small amount of RAM (and kinda stupid)
i used Urllib3 (but urllib2 and native Fetch works too) small request to webui can takes up to half second
and request to generate "prompt" can take up to five.

With good PC and GPU you can fit latest and complicated models like X-Win-MLewd 7b or even some of 13b that actually
able to lead deep conversation and remember more stuff you told before.
So there is no problem to make nice VN on ERP LLm, things like image recognition and generation will work too
but with Top PC and Nice GPU with at least 16GB.
Can't check further as i coding on potato pc and Phi-2 is only model i can afford.

Add: Oh and generated image is sent to Renpy in b64 (or more like requested )
so in Renpy you can decode and set it as Renpy Displayable and use in game even without saving as file.
Soft i used for testing is Renpy 8.21, Llama.cpp, Oobabooga, Cobalt, various LLM models and Notepad++
 
Last edited:

mobi_us

New Member
Dec 18, 2022
1
0
I done that, it's actually very simple
fastest model i can find is 'Phi-2 3b q4 k m' it's small and take small amount of RAM (and kinda stupid)
i used Urllib3 (but urllib2 and native Fetch works too) small request to webui can takes up to half second
and request to generate "prompt" can take up to five.

With good PC and GPU you can fit latest and complicated models like X-Win-MLewd 7b or even some of 13b that actually
able to lead deep conversation and remember more stuff you told before.
So there is no problem to make nice VN on ERP LLm, things like image recognition and generation will work too
but with Top PC and Nice GPU with at least 16GB.
Can't check further as i coding on potato pc and Phi-2 is only model i can afford.

Add: Oh and generated image is sent to Renpy in b64 (or more like requested )
so in Renpy you can decode and set it as Renpy Displayable and use in game even without saving as file.
Soft i used for testing is Renpy 8.21, Llama.cpp, Oobabooga, Cobalt, various LLM models and Notepad++
How did you do that? Can you elaborate on it?
 

DiviDreamer

Member
Aug 29, 2020
271
247
No much to say, it still working too slow, right now i playing with image generating and it's real dragging
text a bit faster but to get sane dialog you need big heavy model 13b+
as for now there is nothing to show, all my source files now is Swirling vortex of entropy
 
Last edited:

DiviDreamer

Member
Aug 29, 2020
271
247
If you interested i will provide tools and sources when i get it to work more or less fine.
Done image generating inside Renpy memory without dropping to file (however maybe i should save file for caching purpose)
right now i looking for FAST low resolution Text-to-Image models 256х256, 128x128 preferably (512x512 is ehh ok)
This is for in-game icons creation, if you know good and fast model for this i ready to test it.