What is your videocard?
I am running a chat AI locally with my 4090. It's quite awesome and I find it works even better than crushon.ai, at leasy once you get the hang of it. But you do need a decent videocard. I run a 20b text model which is a great model but 4090 is just enough for it imo. If you have a lower end videocard you might have to use a 13b text model or maybe even a 7b. Generally the lower you go, the worse they get. Or if you go higher than what your videocard can run, the slower they get.
But in any case, it is free and it is private.
Anyway, here is the video I stumbled on that helped me set it up:
You must be registered to see the links
A few things that I learned that I would add to the video guide concerning roleplay chats (this is for after you've followed the video and installed the web-ui)
1.
There are many models ranging from 7b to 70b. If you go for AWQ models (which are the fastest if you have the vram for it) it basically means 7b needs 7GB vram, 70b needs 70gb vram, etc. So while I can run a 33b model on my 4090, it would go at the cost of speed. 20b seems just right for a 4090.
There are also CPU models you could try, but I haven't played around much with those.
For my model I use
TheBloke_Emerhyst-20B-AWQ
- I found it to be the best working one. from the 20 or so I tried out. I load it with AutoAWG loader with max_seq_len 4096
But this is when you use a 4090 that has 24gb vram. It it not as fast as I can read, but its speed is just about tolerable for me. Any lower and I would lose immersion. But perhaps you don't mind a bigger delay if it means better results, so just try it out I guess.
The best 13b model that I found for roleplay is this one:
TheBloke/MythoMax-L2-Kimiko-v2-13B-AWQ (AutoAWG loader)
As for 7b models, I found most of them just not intelligent enough for my roleplays, but here are two you could try:
TheBloke_llama2_7b_chat_uncensored-AWQ (ExllamaV2-Hf loader)
TheBloke_Wizard-Vicuna-7B-Uncensored-AWQ (AutoAWG loader)
The max sequence length can be found on model pages.
Note that uncensored doesn't neccesarily mean the only ones capable of nsfw and also don't neccesarily mean they are good for nsfw. But you could probably do wackier stuff with them. For my needs the Emerhyst one is plenty nsfw.
2.
Extensions: I also enable the
long_replies extension in Session tab. I find it seems to work well if I set that to say 500 for my roleplays. I also have an automatic1111 webui for AI art and tried the sd_api_pictures extension, which can be fun, but I found it interferes with my AI's replies too much. Depending on the character you want to roleplay with though, it could be a nice addition. But you would also need to install automatic1111 and run it with --api in commandline (edit the automatic1111 start batch file and set "set COMMANDLINE_ARGS=--api" )
3.
Parameter preset, Midnight Enigma works best for me. I tend to choose that then play around with temperature if I get weird results. If I get really stuck in a convo on something strange, I sometimes play around with other presets, but midnight enigma seems to give me the best replies.
- I also set max_new_tokens to 2024 (my model is a 4096 model, and for this model in particular this seems to give me the best results, but you may need to set this differently depending on your model. I have read is that you want it as high as possible until the Model starts to do too many strange outputs, then you are too high.
4.
Creating a character. I find that it is best if you try to keep it as short as possible. What I do is I describe the characters and players core persona with a fairly long description then I add a list of traits below. for example:
{{char}}'s persona: {{char}} is a blah blah blah and a blah blah blah, etc.
{{user}}'s persona: {{user}} is a blah blah blah and a blah blah blah, etc.
{{char}} likes xxxx
{{user}} is a xxxx
etc etc
With the stuff in persona I have everything that I don't intend to change throughout the roleplay, these are core traits and likes and dislikes that won't change. Where as the list of traits contain facts and details I do expect to change. So as the roleplay develops I can remove/add those as I see fit.
Do note that the larger you make the character template, the more it goes at the cost of the bots memory. So try not to be too verbose and repetitive and keep it simple.
5.
Chat mode: in the chat window setting (below the bot) choose mode chat-instruct for best results
That should be enough to get you going. I am stil figuring out a lot of stuff myself, only been playing with this for 2 days now.
Note that you can still get some weird results. For example I had a character that kept trying to set me up with a guy and even though I added {{user}} is not gay in template, she kept doing it regardless (probably because I left it in the history messages a few times by not immediately regenerating the message she outputted but instead by responding to the message, and thus leaving the tokens in its history. But I think once you get the hang of it, you start to learn which messages to regenerate and what kind of messages to type to get your desired results.