Thanks for the help but 12k, 16k context size makes no difference. I guess I'm done.
Different models have different context size and capabilities. One of my favorite models is only a 4k context which greatly limits it's use. Simply cannot push that model to 12/16k. I've got a decent system running a solid 3060 card and 8k is my comfort zone.
Also game will input 'tokens' which eat context. I would suggest breaking game into short 'chapters' or segments which are designed to be started fresh, aka close and clear past context &tokens. The first entry of a new chapter would be written by the {{user}} which is a quick recap of their past play. That should inject the important tokens after a reset without carrying past a models capability.
design chapters around an expected context length and require a restart between chapters. I expect game to have heavy tokens. I expect to inject 4k tokens to START play, then the average player is gonna have short back and forth chat each reply is less than 100 tokens so you'll get 40 +- input/output before a reset is required. I'm a writer and most of my chats are closer to the 3-400 token range (and that's cheap because I limit myself).
simple chat words are cheap, actions and scene can get expensive token-wise.
I still say this game is cutting edge and will require a high knowledge base from the player. I use silly tavern and it has a wonderful 'regenerate' button on Ai replies. basically if I'm unhappy with reply I can delete and try again. Some models handle regenerate well with wildly differing outputs and some are carbon copies.
currently running
You must be registered to see the links
with 8k context. Great model but too predictable, that may be helpful for a game.
fave:
You must be registered to see the links
wild but only 4k context max.
might suggest
You must be registered to see the links
9b model for average systems and you can use the q6 or 5 which is best bang for the buck system wise. Should be rather consistent and less wild with regenerate and replies. Probably needs some solid repetition penitently tho