I started thanks to
melantha's small guide or a couple you can find here. Started with LM Studio + Silly Tavern as sort of back end and front end, so yes you can use both, but some time ago turned to Kobold + Silly Tavern, as with LM Studio connection was somehow breaking periodically and generation stopped after that. I.e. Kobold, LM Studio and Ollama are used for running a model, but usually people chat and have their char cards in Silly Tavern.
The models for me are usually
You must be registered to see the links
(based on
You must be registered to see the links
) or sometimes
You must be registered to see the links
. I tried some more like Gemmasutra, Stheno, Nemomix unleashed but wasn't particularly impressed - mostly due to some blatant logical lapses like cheating GF/wife nonchalantly telling you about how she's had a nice time with her affair partner in the first messages, quickly starting to mess names/sexes, or just the language style. BTW I started actually online on Yodayo, but then moved on to Janitor and LLM, and Yodayo is kind of monetizing more aggressively now as far as I understand, with every message costing you some internal currency of theirs a bit of which you can get daily or just buy, so you can also find their own model
You must be registered to see the links
which is one of the options at Youdayo, and use it locally. It's a simple one, but still.
I import char cards from Janitor AI via web links, or Chub AI via downloading jsons and then importing them into Silly Tavern. Set temp to 0.7 to curb down creative nonsense of LLM, that's the value that is used by Janitor AI if you use it online by default. There is also a thing called Lorebooks on Chub AI that you can import into Silly Tavern. A lorebook is kind of additional layer of events that can kick in depending on appearance of key words in your chat. Might fine-tune your chat or liven them up some. Search e.g. for Cuckworld, Slow corruption, Humiliation methods, Manipulation methods lorebooks and some more, if interested.
I don't do much of RP now though, as I got to think that:
1) Chars are essentially non-entities, they'll do whatever they want them to do, they're too reactive. But it's kind of natural as chats are designed to respond to your inputs and don't know what you want.
2) Once you hit you context limit in tokens of about 4-12k locally, first point will get even worse along with decreased performance, LLM forgetting events and people that lie outside of these last 4-12k tokens in chat, which - together - considerably deteriorate the experience further.
A way around this is to use some powerful online models with huge token limits like ChatGPT, Claude Sonnet or perhaps new Deepseek v3, I don't know how latter will perform with NSFW stuff though. But recently
Pango_12 translated a couple of novels and shared
them here, so I guess it has to be fine in terms of censorship. And use really well written char cards with setting and lore, and characters' descriptions worth of a few thousand tokens (not that there are many of them, if any, from what I've seen). Or wait for LLM & char cards to mature some more...
P.S. I'd appreciate a link to quantized Magnum V4 model on Hugginface that was mentioned here to try it out, though.