Which one of these two models would be better for NSFW RP?
You must be registered to see the links
You must be registered to see the links
or is there any better ones? Tbh, I just want the 'best' thing that my RTX 2080 Ti can handle, the only models I've tried so far are Pygmalion 7B/13B
TLDR: My favorites so far are Wizard-Vicuna-13B-Uncensored or Chronos-Hermes-13B with the Shortwave parameter preset, switching to Godlike occasionally if it seems to get into a rut.
I've had decent results with WizardLM 13B Uncensored as you mentioned, though my go-to is
You must be registered to see the links
(a combination of Wizard and Vicuna). The uncensored WizardLM and Wizard-Vicuna tend to be a bit more... friendly but encyclopedic? Like, knowledgeable but when pushed into a corner case of their knowledge tend to recite facts as the narrator and then say stuff like "if you'd like any more information about this, feel free to ask." But overall I get good sexy conversations and roleplay and descriptions, and it's my preferred model for general-purpose stuff (which is invariably NSFW).
A different one I like is TheBloke/chronos-hermes-13B-GPTQ - it likes to write longer, storylike prose responses (so you often want to hit 'continue' to let it keep writing if it gets cut off partway) but I've found it can sometimes get out of control and boringly verbose when you reach the context limit of 2048 and it starts dropping older messages. Careful prompt management can often keep it from getting goofy.
Overall, I prefer any 13B version for quality on my size of card, even if the 7B models are significantly faster. They're less... childish? simplistic? than the 7B models. I'd like to run something larger but it a ~30B just won't happen with my 3060Ti with 8GB VRAM (I can load it GGML barely into my 32GB of system RAM but it takes several minutes to ever a response.) I have an A40 at work (48GB VRAM!) I want to try, but I wouldn't be caught dead trying NSFW stuff on it.
I've also tried using the SuperHOT-modified models that allow for up to 8192 context (instead of the typical 2048) but once you hit the context limit and it has to discard old messages, it has to load all 8192 tokens into every single request and it gets sloooooow. I'm not sure how anyone would fix that, outside of multiple video cards per machine (supposedly I have
The biggest factor (other than 30B >> 13B >> 7B) is having a good prompt that targets the kind of story and interaction you want, which seems to often be more important than the specific model. Next, messing with the parameters (usually presets are the way to go) can massage it into the kind of behavior you want - some models like certain presets better, e.g. Wizard-Vicuna goes well with Shortwave or Godlike for NSFW imo. After using KoboldAI and KoboldCpp and text-generation-ui (oobabooga) independently for a while, I finally settled on running textgen in the background (with the --api option) and picking my model through its UI, and then launching SillyTavern and pointing it at the local textgen API address.
One really nice part about switching to SillyTavern is there's a lot of fun NSFW scenarios at
You must be registered to see the links
(admittedly of varying quality) so I don't have to come up with prompts and character cards, but simply tweak them to how I like them, and SillyTavern has a 'download' button on the character panel that lets you just put in the Chub URL, easy peasy. Unfortunately, these character cards don't often say which AI they are intended to target (for instance, it could be OpenAI, Claude, NovelAI, etc. that is not a local LLaMA model) so sometimes no amount of fussing with parameters and model and tweaking the prompt can get satisfactory performance, but I usually have a fun time with just about any of the top-rated NSFW characters on there.
I know this is waaaay more info than you were looking for but I've been enjoying this space a lot and hope it helps somebody out.