VN Completed AUTISM TRANSLATIONS

Jan 2, 2020
61
8
I take back what I said about Stheno, swapping it in and out with a large model during response writing produces very good results. Sao10k has good data, and there's a lot of story seeded in there. One of the great things about Stheno is its unpredictability. But one of the worst things about Stheno is its unpredictability... I think this could be managed... Swapping in the big model helps restore proper attention to detail and some sanity to the context. It can be really bad though. Sometimes I'll just get a null response. Sometimes the response will use different formatting. Again, sao10k is talented and knows what he's doing. If he couldn't get it sorted out, that's a very bad sign. I do have some ideas for how to build a training dataset for a system like mine, but... Yeah, definitely going to look into doing this after August's translation. Maybe the hyper-specific trained army of 8b's (or maybe even gemmas, but again not sure how effective training is on gemma) is not so implausible after all. Would be kinda sick. The other thing this would help is obviously cost and speed. I do hit two minutes fairly regularly once the context starts filling up and that is just too long to wait, even for me. I think ~60s is limit for something like this.
 
Jan 2, 2020
61
8
Have been doing a lot more testing with local models. A promising combination for TA is:

Wizard-8x22b (gemma 27b can do it cheaper but fuck that license) as orchestrator and world state manager
L3 70b New Dawn on response writing and editing ( )
Literally any recent non-meme tune small raw instruct model for all misc tasks (query writing, aggregation, maintenance, etc) like wizardlm2-7b, qwen2-7b... Alright, I guess qwen-7b only hypothetically. I really like the Qwen models for some reason, they do have good writing, but when they fuck up it's usually in a major way.

For response writing, I've been experimenting with a very very smart corpo model scoring and judging which models to use but it's... Difficult to explain the process in concrete terms. There are times when plugging in Stheno can work wonders, but you definitely don't want to leave it on for too long. Stheno is way too chaotic and _will_ wreck your shit, it's only a matter of time. It's kind of what I was describing with the in-context base model learning. A totally different route to a similar result - very high creativity but metal patient behavior.

I just started messing with New Dawn. Somehow the release passed me by. Obviously Midnight Miqu still contender for best RP model despite its age. You see what I mean about L3 being untrainable though. Even author here himself admits:

> I suspect the first thing people will want to know is how this model stacks up against Midnight Miqu. I'd say it compares favorably, although they're more like cousins than siblings. I would say that Midnight Miqu still has an edge in terms of raw creative juice when it has a good squeeze, but New Dawn is smarter and understands nuances better. You can judge for yourself, but keep in mind that these are simple, one-shot prompts. As you get deeper into your own complex scenarios, I think you'll see more of New Dawn's worth.

Yeah, Midnight Miqu was a fine-tune of a leaked quant model that was blown up to f16 with padded weights. Most people (myself included) would then be running a re-quantized version of_that_. That it worked at all is a testament to how fucking good of a model Mistral Medium actually is / was.

But a proper tune of a model with full weights available, like Llama3, which has claims to being the "best open model", several months after the base model released, done by someone competent and with experience and the best we can get is "well, it's kind of a lateral move" compared to miqu? OK fine, not a tune, technically, but a merge. But I've tried Smaug by itself too and honestly, I would still take something like Mistral Medium over that (I mean the actual medium on the API, not miqu). Now, sonnet-3.5, I guess. All these are priced roughly the same, with L3 being slightly cheaper. I mean if you're going corp cloud already, might as well go for the boss at this weight class. My guess is sonet-3.5 is an 8x22b MoE. Wizard is not as smart or insightful, but it I think that's down to its training, not its weight class. I mean, maybe Anthropic found some other weird weight class and it's some freak we don't even know can be good like 8x29b or something, but it's definitely on that order. Cost, intelligence, and speed.

I did try L3 Euryale when I first started this but it wasn't workable. Stheno is, although it needs careful management and scaffolding around it to keep it on some rails... Might not be worth it. Then again, most of that scaffolding is already in place, it would just need some extra special casing for the times Stheno is swapped in (long term goal being custom tunes I do on my own hardware of L3 / qwen2-7b special cased to AUTISM outputs). L3 8b having a properly done abliterated version already as a starting point makes it an easy sell over qwen. I mean, can you fucking imagine? Software that just does what you tell it to do and doesn't moralize at you first and you don't have to fucking hack your own tools to do the job you're trying to do?

There is also Magnum to try out - a qwen tune, so I'm interested. . "Designed to replicate the prose quality of claude"... I mean, I get it. Claude has the best writing, no doubt. But this approach (sao10k's claude opus synthetic dataset which was used here) can only ever produce an - at best - slightly shittier Claude. I can get good Claude already at about what I pay to run a 72b. I'm going to try it and not complain any more though. Ultimately, I don't have a better suggestion. Building a data set is not easy and especially with the recent models you need insane volumes of data... I remain hopeful for some breakthrough making training more practical. In the meantime, again, I like qwen. qwen pretending to be claude - i'm not going to pass that up.

Edit: Tried Magnum for a bit, generally unimpressed. Inherited lots of Claudisms
claude.PNG
In addition to the horniness. Had much better results with New Dawn.

Edit 2: Working with New Dawn... It has its own problems... Hear me out. I have a few test personas that should not be difficult to portray. If I showed you the card you'd say "that's boring shit, why would you even make a card like this". But they have some very, very specific constraints. Constraints which are unambiguous and referred to throughout the context in various ways and re-iterated several times. For one example - this character cannot physically touch this other character. It's a clear, unambiguous instruction that should be easy enough to follow. I don't provide a reason - absolutely everything else is a normal vanilla situation - I just stipulate that clause and reiterate it in different ways with examples. You'd be surprised at how difficult it is to get a model to follow this instruction in an RP context. It's a pretty good test (one of them, anyway) for how effectively a model generalizes to unusual situations. Typically, the tunes which were trained on a large volume of narrative training data or with aggressive training hyperparameters with a narrative data set (e.g. llama-3-storywriter) find this task completely impossible. Once they lock into a pattern in the writing (which itself, has a gravitational pull to Llama3's base personality), it will latch onto those things more and more. This happens to a certain extent with all models, but in the "story" tunes, this is especially egregious.

What makes this problem even more insidious is New Dawn's actual inherent quality. It writes very well, is creative. It's almost exactly what I'm looking for. But when it fucks up, it's usually in a completely breaking way like that - it'll violate a core constraint instruction. Like I mentioned before - a single failure in this system can be a very big deal. We have hard persistence and - for better or worse - events that happen, happen. I'm trying to create an immersive simulation experience. That means no editing of persona responses. That concept doesn't even make sense in my system. The response you are seeing is an aggregated editor response. Even if you were to edit this response, you're not editing any world updates which may have gone to the DB, updates to the user's psych profile, memories created in the DB for a persona... Right now, I do allow a single regen that does an undo. That's as far as I'll go there. My point is that consistency is more important to my use case than most. Yes, I have a QC bot I can activate than check the inputs going into the DB. Here's the problem though. New Dawn is very smart and produces very high quality, readable, and convincing prose. It will easily fool a small model that's sanity checking data. Just use a bigger model that can understand? Well then, that model would just replace New Dawn altogether, wouldn't it? It's kind of a critical flaw. I'm not giving up on it, just something I noticed.
 
Last edited:
Jan 2, 2020
61
8
Also, lol, but I rolled a new char on Elden Ring. Last run, I was doing a faith build, get to the aqueduct gargoyle fight and... Holy shit, I must have tried that fight 100 times at least. Worst fight in the game for a faith build (at least that far in). This time around, I rolled up another char, agi build. Got that sentinel's spear. Currently at +17 upgrade and just finished Nokron. Complete breeze. Yes, I know larval tears exist. Just got Siluria's Woe the other night, which is cool but... Yeah, can't really stop thinking about models. I do want to utilize the RPG books better, have some awesome ideas there, but again, that's more of an August thing. I'll probably start on custodian persona work tomorrow. I have the response and editor in an acceptable - and kind of interesting - place. Stheno - even swapped in at random - is the perfect chaos agent for a system like this... At least so far. We'll see on longer contexts. Right now I haven't ever gone >50 messages... Longer term, the obvious answer is to integrate rogue with visual novel. Nethack, powered by TURBOAUTISM. Or closer to something like Ultima 7 maybe, but with a more abstract interface. That's more of a "through the end of the year" thing. For now, goals are the same. Finish TA custodian thing, load in Oyako Rankan script aggregated and annotated, do a play-by-forum TURBOAUTISM Oyako Rankan run (possibly by myself)... I extracted Albatross, the script files themselves (as in, extracted from the archive) have an encryption on them. I'll try to get that worked out on Monday.
 
Jan 2, 2020
61
8
Another idea I will absolutely implement right now is... Currently, the world state has its own time (to allow for fantasy, sci-fi, historical settings). But default_world is just regular old earth world. While away from the simulation, I can run a lightweight model on all personas and update their events and world status without the user's input. In other words, simulate their actions away from the user, in brief. Gives personas a life of their own by generating these snippets of what they did (character, world, and lore appropriate) and just lumps it into the vector DB so it'll get retrieved on memory recall and world lore search. Could run it like every four hours or something. They can talk about things they actually did and it's not just schizophrenia. It's not pure slop, either. I mean, it's a bit slop, but remember my core ethos here - low context (I consider 16k low context), high RAG / writing samples. Should be good.

Edit: Why not just go full dwarf fortress and simulate the character's backstory as a series of small-model generation steps that will actually write in events into the vector DB the character will actually recall on demand. Can be as rich or as barren as you like. Enriched with story sample RAG. This gives a persona an actual personality with a history that you can explore... As an aside, I've more or less given up on dynamic model selection. I don't think I can make that work (well) without actually training a model just for this task. I mean, right now my selection process is 100% based on feels anyway. And cost too. I can have the custodian switch the models I tell it to in-game and I think I'll just stick to this approach, with sane defaults.
 
Last edited:
Jan 2, 2020
61
8
Mostly slow day today, barely got anything done, dealing with some depressing home shit. I've made some tweaks to how I approach RAG aggregation. Previously, I wasn't retaining anything and just doing another hit against the vector DB (or mongo) and refilling the context with a new query each time. I want this to be a bit more predictable. The approach I'm trying now is to instead retain the results of the last RAG fetch and consolidate as part of the aggregation. This is coupled with - now hard - limits on context size. Basically, always keep e.g. 800 tokens worth of web search results in simulation context and just update it as new data comes in. I think obviously this will get ultra-diluted really fast and become useless for something like a web search result, and will probably need to be purged (on e.g. location or scene change) but I think this is a better approach. Will test this out, get it settled and start on custodian tomorrow, definitely not in the right mindset for that right now.