[AI] Uncensored text generation via Oobabooga

Firenn

Member
Apr 26, 2017
115
92
Where i can find tutorial how to configure and run Chronos Hermes 13b ? What is needed to run? node.js, microsoft visual studio and what else?
 

wildfire42

New Member
Jan 7, 2023
5
14
TLDR: If you use models that have been converted to GGML you can get around VRAM limits (and use non-NVIDIA cards I've heard) but it'll use CPU which is slower. GPTQ models are NVIDIA-only and have to fit entirely within the VRAM, but they are the fastest option if you have a good enough card.

I noticed that everyone keeps using GPTQ models in this thread: they require and NVIDIA card and, as I understand it, must fit the entire model in the GPU. If you get a GGML model instead, they can be split between the GPU and CPU/system memory. CPU seems to be much slower for running models, but if you want to run a model that is just a little too large (for instance, I have a 6GB 3060Ti and sometimes I need 7), the GGML model lets you put the majority of it in GPU and let everything overflow into system RAM and use the CPU.

I also read that GGML models will also work with AMD Radeon and, god help you, Intel discrete cards without having to settle for the original huge unquanitized model. Sometimes it gets really slow though, so even though you can get it to work it might not be very entertaining.

In text-generation-webui (which is the only chat UI I've really used, having been kind of disappointed with KobaldAI so far), the type of model dictates the UI it brings up with the loader options.

Llama models can be overcommitted to GPU it seems, claiming they are using more VRAM than you have. I think this entails swapping vectors back and forth from system RAM, so, big slowdown. Instead of suffering, try and mess with the "n-gpu-layers" slider that lets you limit how much of the model goes into the GPU, and you can see how much it's trying to assign in the application's console window when the model loads. Using GPU but not overloading it while letting the rest spill over into the CPU can let you squeeze in a model that's larger than you can fit with GPTQ.

Unquanitized transformer models (ones that are neither GPTQ or GGML, that select the "Transformers" model loader when you pick their model), like I started with originally, let you drag sliders to set GPU and system RAM limits, but it seems like they don't help and server.py will just crash if the model is too large. Be sure to select the "auto-devices" and "disk" options. "load-in-4bit" will use a separate library to quanitize the model on-the-fly while loading to cut its size roughly in half - at the cost of response quality - and is an option for making e.g. 13B unquanitized models work on tiny cards.

I haven't really played with any other model types to have any tips there, but just about all the models that has converted offer GPTQ and GGML varieties.
 
  • Like
Reactions: imusiyus

ApatiaMax

Newbie
Sep 9, 2022
43
25
There is also a program called , I'm with the developer from about a month, and it's a really good application, quite fast even with the CPU, I can run it on my laptop with 8GB of ram, it have an home-page with many advised and tested models.
So, IMO it's not bad at all.
It's working offline so it's should be safe for NSFW chat/roleplay ;)
It have also a discord channel, from time to time, I'm there too.
 

Deleted member 2282952

Developing I SCREAM
Game Developer
May 1, 2020
416
869
It's making quite the round on few artists I follow.
Basically cloak your image to parasite AI capture.


paper :
Lol, I love the core idea behind it, but I'm afraid it's too late to have a significant impact, not to say that it's not going to have any impact at all.

If the data on the internet had always been cloaked - it would be pretty funny because you first need to write the entirely new AI system JUST to interpret & uncloak, and only afterward can you do all the crazy stuff with stealing human art and words.

I'm pretty sure there will be patterns in the masking algorithms that can be unmasked by reverse engineering them, so not a big deal for big corpos
 

Deleted member 1121028

Well-Known Member
Dec 28, 2018
1,716
3,295
Lol, I love the core idea behind it, but I'm afraid it's too late to have a significant impact, not to say that it's not going to have any impact at all.

If the data on the internet had always been cloaked - it would be pretty funny because you first need to write the entirely new AI system JUST to interpret & uncloak, and only afterward can you do all the crazy stuff with stealing human art and words.

I'm pretty sure there will be patterns in the masking algorithms that can be unmasked by reverse engineering them, so not a big deal for big corpos
Even if you're right, and I think you are; they will just ignore such data for now, at least until it doesn't affect their gluttonery or curbs their appetite. Right now costs gonna be too much for such fringe sample (try reserving few A100s and look at the price, Amazon doesn't make deals for anyone). Short term I think it's not a bad solution for artists who want a quick opt-out.

And they have another fish to fry. Next big target for AI is clearly social network (just because the sheer size of it - and they need it -). You could see all of them (Meta/Twitter/Reddit..) modifying their API in the same month trying to slow down the scrapping, mostly to better re-sell it instead of giving it for free.

Anyway I was reading while fishing (under the rain mind you, ffs), I came across that gem : "Like a Midas with a thousand fingers, [market rationality] afflicts everything it touches, and nothing escapes it. What it has not eliminated, and what we believe to be intact, is, in the manner of a skilled taxidermist". If you replace [it] with AI, you've got quite an apt metaphor lol.

3.jpg
 

F0xii

New Member
Jun 18, 2018
6
4
Which one of these two models would be better for NSFW RP?


or is there any better ones? Tbh, I just want the 'best' thing that my RTX 2080 Ti can handle, the only models I've tried so far are Pygmalion 7B/13B
 

wildfire42

New Member
Jan 7, 2023
5
14
Which one of these two models would be better for NSFW RP?


or is there any better ones? Tbh, I just want the 'best' thing that my RTX 2080 Ti can handle, the only models I've tried so far are Pygmalion 7B/13B
TLDR: My favorites so far are Wizard-Vicuna-13B-Uncensored or Chronos-Hermes-13B with the Shortwave parameter preset, switching to Godlike occasionally if it seems to get into a rut.


I've had decent results with WizardLM 13B Uncensored as you mentioned, though my go-to is (a combination of Wizard and Vicuna). The uncensored WizardLM and Wizard-Vicuna tend to be a bit more... friendly but encyclopedic? Like, knowledgeable but when pushed into a corner case of their knowledge tend to recite facts as the narrator and then say stuff like "if you'd like any more information about this, feel free to ask." But overall I get good sexy conversations and roleplay and descriptions, and it's my preferred model for general-purpose stuff (which is invariably NSFW).

A different one I like is TheBloke/chronos-hermes-13B-GPTQ - it likes to write longer, storylike prose responses (so you often want to hit 'continue' to let it keep writing if it gets cut off partway) but I've found it can sometimes get out of control and boringly verbose when you reach the context limit of 2048 and it starts dropping older messages. Careful prompt management can often keep it from getting goofy.

Overall, I prefer any 13B version for quality on my size of card, even if the 7B models are significantly faster. They're less... childish? simplistic? than the 7B models. I'd like to run something larger but it a ~30B just won't happen with my 3060Ti with 8GB VRAM (I can load it GGML barely into my 32GB of system RAM but it takes several minutes to ever a response.) I have an A40 at work (48GB VRAM!) I want to try, but I wouldn't be caught dead trying NSFW stuff on it.

I've also tried using the SuperHOT-modified models that allow for up to 8192 context (instead of the typical 2048) but once you hit the context limit and it has to discard old messages, it has to load all 8192 tokens into every single request and it gets sloooooow. I'm not sure how anyone would fix that, outside of multiple video cards per machine (supposedly I have

The biggest factor (other than 30B >> 13B >> 7B) is having a good prompt that targets the kind of story and interaction you want, which seems to often be more important than the specific model. Next, messing with the parameters (usually presets are the way to go) can massage it into the kind of behavior you want - some models like certain presets better, e.g. Wizard-Vicuna goes well with Shortwave or Godlike for NSFW imo. After using KoboldAI and KoboldCpp and text-generation-ui (oobabooga) independently for a while, I finally settled on running textgen in the background (with the --api option) and picking my model through its UI, and then launching SillyTavern and pointing it at the local textgen API address.

One really nice part about switching to SillyTavern is there's a lot of fun NSFW scenarios at (admittedly of varying quality) so I don't have to come up with prompts and character cards, but simply tweak them to how I like them, and SillyTavern has a 'download' button on the character panel that lets you just put in the Chub URL, easy peasy. Unfortunately, these character cards don't often say which AI they are intended to target (for instance, it could be OpenAI, Claude, NovelAI, etc. that is not a local LLaMA model) so sometimes no amount of fussing with parameters and model and tweaking the prompt can get satisfactory performance, but I usually have a fun time with just about any of the top-rated NSFW characters on there.

I know this is waaaay more info than you were looking for but I've been enjoying this space a lot and hope it helps somebody out.
 

Deleted member 1121028

Well-Known Member
Dec 28, 2018
1,716
3,295
A different one I like is TheBloke/chronos-hermes-13B-GPTQ - it likes to write longer, storylike prose responses (so you often want to hit 'continue' to let it keep writing if it gets cut off partway) but I've found it can sometimes get out of control and boringly verbose when you reach the context limit of 2048 and it starts dropping older messages. Careful prompt management can often keep it from getting goofy.
Imho it's my feeling too so far. For non-native english user, it might come the most handy model to decline/iterate more or less proper english.RUSSIAN HEAR ME OUT
 
  • Like
Reactions: wildfire42

ApatiaMax

Newbie
Sep 9, 2022
43
25
What about a LoRA?
I've made a couple or research but it's a bit out of my knowledge and also I've really not time for it, but making a LoRA to extend LLM capacity could be a work-around to the models limitations.
I've worked a bit on characters to have a Dom that tease and submit the user but until not when the hot moment comes it always fail \(〇_o)/
 

ApatiaMax

Newbie
Sep 9, 2022
43
25
I've had decent results with WizardLM 13B Uncensored as you mentioned, though my go-to is (a combination of Wizard and Vicuna).
Look at that chat with the model that you advised :D

You don't have permission to view the spoiler content. Log in or register now.

I think I should investigate to see if they exist or if she's just teasing me :ROFLMAO: