So I tried some 8K SuperHOT merges out... and either the hoops I needed to jump through to get them to work didn't work, or the merge of SuperHOT for 8K with existing models is not good currently. They're just less coherent overall, and as the context goes up it gets worse, but not nearly as "AI screaming in pain" as trying to extend a 2K context model. Give it another month, KoboldAI isn't ready for it, Oobabooga is, though the UI for Ooba is unbearable copycat shite and not meant for stories.
I'd like to give an honest shoutout to SillyTavern, which is quite good for chatting and requires a backend like Ooba or KoboldAI. It's a fun side trip to try out models in a more conversational context. With character cards! It's like hopping into a curated adventure with a character, and some are places or groups instead. This reads like an ad, sure, but I genuinely enjoyed a change of pace.
SillyTavern:
You must be registered to see the links
Character cards:
You must be registered to see the links
As for 4-bit... If you've got 24GB VRAM, you can fit a
4bit 33B model with 2K context, if you don't run anything else. Airoboros blows Erebus out of the water, no competition. Airoboros 33B is consistently competent and I find myself adding flavor to it instead of just deleting sentences. Every other output, I'll correct a weirdly worded line, or clarify to help steer and maintain a character's image and stop them from monologuing.
I'd never go back to any of the base models for KoboldAI from the AI picker, they're quite dated and too much of a chore by comparison. There is only one reason to not use 4bit GPTQ, and that's to use GGML models designed for CPU and Apple silicon. (or training, or merging, but let's be real if you did that you'd be using a different site) For creative writing, I feel like the larger models write as if there was an editor involved and the writer is focused, whereas some of the smaller models feel like a first draft written on a bus, where every page was written in an entirely different mindset. Yes, you can get sparks of truly wonderful writing from smaller models.
I've heard good things about Airoboros 13B as well, if you're a little more VRAM limited. I suspect that will fit on a 12GB card, or overflow a bit but still be usable on 10, and probably not too great on 8GB. Try out a bunch of models, they each have their own flavour you may or may not like.
Airoboros 33B:
You must be registered to see the links
Airoboros 13B:
You must be registered to see the links
Between larger and smaller models, your mileage may vary, but I find you either go with a smaller, faster model and resubmit a bunch then edit down to keep what you like, or you take a bigger model with a slower speed and resubmit less. Don't get me wrong, there are still some issues, but for my standards I feel Erebus 6.7B is a 5-7/10 (now that the rose tinted honeymoon glasses are off), and Airoboros 33B is a solid 8/10. Hard to believe how much has changed for me and the software involved in half a year. 33B used to require like 100GB VRAM to run.
With everyone going hardcore SFW and locking down their models to their detriment in flexibility, I'm so glad this stuff exists. Even distinctly non-smut writing prompts often get "As an AI Language Model..." bullshit, if there is a hint of violence, gore, or sex involved. Or you just get your output rejected and deleted.