Hey, i have r7 5700x and 4070ti.Can i make this kobold thing? And how... I realy dont know where to startif you have the hardware the answer is always going to be running something like kobold locally for a variety of reasons
i've been using koboldcpp being fed into sillytavern running off of a crunchy 2017 i7 and an amd gpu to acceptable results
if you just want to run a language model for coombots,Hey, i have r7 5700x and 4070ti.Can i make this kobold thing? And how... I realy dont know where to start
dude i think we need something like NVIDIA A100 80GB VRAM GPU for 20k $ to run decent 65B model locallyHey, i have r7 5700x and 4070ti.Can i make this kobold thing? And how... I realy dont know where to start
damn i was going to try candy.ai But i was sceptical as all those "ai" sites that emphasize image generation over realistic chat with chars do not inspire confidence. I mean it looks like they hide weak ai behind shiny bots images. Looks like i was rightI have tried candy.ai and it is very good but after a few messages the memory is faulty and the character changes personality. I am willing to pay for a good chatbot but crushon.ai is very expensive. Do you have any suggestions for me?
I don't like the conversation when the chat suddenly loses memory and the character's personality changes, you know? I'm also looking for similar AI sex chatbots. A lot of folks are recommending Soulfun.ai, and I even got a membership, but man, their voice feature really needs some improvement. But gotta say, their chat experience and picture options are really cool.Soulfun is my first option, it has good memory and without any nsfw filter.
I'm trying it out at the moment. Not bad for relatively short interactions, which mine tend to be. I do have enough memory for some of the models, but I'll be increasing it soon for the LLMs. What would you suggest? Selecting an LLM and using that to do the deed, or instead selecting a lighter but purpose-built model for NSFW?A piece of information before you sail away, context window has no work around. Every model will forget about your chat and turn unreliable after the token (memory) cap is exceeded.
Do you know what llm models they use?I use promptchan.ai and it can chat and generate photos. So it provides chat with sending selfies. I've tried others and I would rate this one 4.5/5.
I'd say uncensored rp tuned models (>10B) work *alright* but after around 10 to 20 chats you start to predict their outcomes. My gut tells me that even if they are uncensored they were never fed a relatively big amount of nsfw literature. For some reason most of those models are fucking obsessed with nipples. I have even started to ban that word in the character cards haha.I'm trying it out at the moment. Not bad for relatively short interactions, which mine tend to be. I do have enough memory for some of the models, but I'll be increasing it soon for the LLMs. What would you suggest? Selecting an LLM and using that to do the deed, or instead selecting a lighter but purpose-built model for NSFW?
Thanks for the reply. I guess a balance between context memory and natural responses is the way to go. I've used other AI bots so I'm familiar with the fact that they're not gonna retain details for very long, and that they'll say some wacky-ass shit every once in a while. Many years of fapping with one hand and slogging through buggy, broken v-0.000.0.01 pre-alpha grindfests has prepared my patience well. So that part thankfully doesn't bother me too much.So, with that in mind, I feel somewhat confident to recommend not limiting yourself to a couple of llms. Exploring and changing is much more fun, refreshing and surprising. In my case, I found out I rather smaller models (from 8B to <30) and expanding token context window with that extra memory you are saving.
With 64GB of 3200 RAM (really it's running at 2200 or so), I can't even load some 70B models into memory. While the steady use space is around 30-45GB, and I have spare memory available for that task, it seems that when loading some models initially, the memory usage is 50%-100% higher for a moment.I'm using a 7B NSFW model at the moment, by the way. I'd like to triple that and see how my results fare, but it'll be a week or so before I can report on that.
You need ram to do the loading process (from disk to ram or gpu). Wild guess here but maybe you are running out of ram to load onto ram which is actually funny xD There might be an option to solve that tho. Everytime I try to get my hands on inference* I get lost on so many damn options flags and motherfucking modules and versions. As a femdom connoisseur I believe I have the mastery and authority required to accuse all those fuckers of sadistic balls torturers.With 64GB of 3200 RAM (really it's running at 2200 or so), I can't even load some 70B models into memory. While the steady use space is around 30-45GB, and I have spare memory available for that task, it seems that when loading some models initially, the memory usage is 50%-100% higher for a moment.
So I've been using an 8x7B model. It's fairly slow, running at about 0.5 seconds per word, and depending on the model, you get non-sequiturs frequently. Hell, the one I'm using now is the thirstiest thing I've ever seen lmao. It'll take some fine tuning for sure, but I'll try to stick to the under 30B models as suggested and see how that goes.