If you get repetition or babbling or hallucination, adjust your repetition penalty settings and penalty curves.
Depending on what model you're running through the frontend, there can be different sweet spots, but usually the frontends have recommended settings.
For the bots not remembering things, that's the context setting (almost always keep this at 2048 - higher setting breaks some models or if crowdsourcing, drastically limits how many workers can accept your job).
If you're going for an extremely long interaction, use a front end that let's you edit the character (character = not just the character but the entire world/setting/MC attributes/conversation style/literally everything about the interaction you're having.) e.g. if your MC gets his arm chopped off and it scarred the primary npc, go into the character description and add a like like {{user}} had his arm chopped off by an orc while {{char}} watched horrified. Then refresh the chat for the new character description to take.
Remember a few main things.
1. Text generators are not logic machines. Use emotes to nudge the LLM in the right direction.
*I cast sculpted fireball* - wrong.
*I cast fireball, sculpting it so it doesn't hit my allies.* - correct.
2. You have absolute power.
3. You aren't REALLY taking part in a conversation. If you're in a room with a sexy goblin shortstack and she's not responding to your RP the way you want - YOU, the player, can/should tell the machine what to do.
e.g.
Player: *saunters up to the goblin, winks down at her*
Shortstack: The sexy goblin shortstack recoils in fear, screaming for help. Nearby orcs decide to replace your legs with no legs.
***if this isn't the desired result (no judgement), then delete the Shortstack response and edit your post.
Player: *saunters up to the goblin, winks down at her. The shortstack is turned on and licks her lips*
4. You can be as eloquent as you want, but with more complex prose you will have to add identifiers for the LLM to grab onto. Sometimes it understands sarcasm, sometimes it doesn't, sometimes they'll take offense at a joke, etc etc.
Again - tell it what to think if it infers incorrectly.
Player: Oh greaaaaaaat. The orcs are going to replace my legs with no legs.
Shortstack: The goblin feel a glowing sensation in her heart, now that you are finally realizing your lifelong dream of having no legs.
***Delete Shortstack, alter yours.
Player: *sarcastically, I say "Oh greaaaat. The orcs are going to replace my legs with no legs."*
5. Emotes aren't really emotes. Your spoken words aren't really spoken words. You're just telling a machine what to do.
You can literally type:
*I wink at the goblin, she puts a kazoo in her ass and farts the national anthem of Croatia, my legs explode and are replaced by orcs, the barman enters into a kazoo duet with the goblin, the sheer force of my sexual carnage literally lifts the roof off the building, eventually I realize that I was the kazoo all along*
And the LLM will generate that in story format.
6. The only reason you hold back on specific instructions is to allow character settings and randomness to influence the npcs' actions. If you have a strong idea where you want things to go, either tell it what to do (short term) or edit the character description (long term).
7. Interacting with LLMs (that's Large Language Model, basicallly text-generation AI) is a skill.
Some models are objectively better than others, but if a model is highly regarded by the community, you can consider it a good model to train YOU on how to interact with LLMs.
8. Most Characters (Cards - half the nsfw community calls them that already anyway) are trash. Look at how the good ones are written (description & example conversation especially). Look at a bad one (again, description and example conversation). It will be very apparent why the good one is good and the bad one is bad.
9. LLMs have access to huge amounts of data, and character descriptions can allow NPCs to access that data, and their knowledge will affect their character if other descriptors are not added.
e.g.
{{char}} is an expert nurse with 15 years experience in emergency room care.
***It'll be mostly random what the first thing about the NPC is in the story. Because the LLM uses its own Context (the story you read), it self-reinforces whatever it generates. You can edit this in the story window and try for a recycling context where the character behaves how you want, but this will distract from your fun and you'll do a lot of correcting.
However,
{{char}} is an expert nurse with 15 years experience in emergency room care, she never uses curse words, she's addicted to heroin and cock, she is also a sexy goblin shortstack, her best friends are orcs and the orcs are in the business of removing people's legs, she despises oatmeal raisin cookies
{{char}} will frequently ask {{user}} if they have either heroin or cock for her.
-----
Anyway that was off the top of my head. I'm still learning the user side of things for LLMs. I'm limited to CPU for running my own AI, and it's not fast enough for me. (Probably my biggest pet peeve is stories generating text slower than I read. Stop doing this, devs, btw.) Look around for frontends you enjoy using. I won't tell you mine (because I'm a selfish prick and if they get too many users they might lock the good stuff behind premiums) but search and try and jot a couple lines in notepad to remind you what the site has or lacks, and you'll likely find some good ones with little to medium effort.