I used OogaBooga WEBUI to load them, and then ported that into Silly Tavern at the time. I used mostly llama type models, but a few other. It was a few months ago when I dabbled in it. I gave up after seeing how slow the responses were for anything past a 16B model. All models I used were uncensored and some even trained specifically for RP. So they had pretty strong responses for the few times I'd wait for them to respond. Sadly it'd brick my video card from doing much else. For 30+B context model you need 32GB VRAM to have decent response. 24GB is still good enough to have them output however. To run a 32B model you could probably buy a 6K rig, with 3 AMD 7900XTX's with a huge 72GB of VRAM to have strong AI machine. Heck you could maybe pull off a 70B context model without too much trouble. When you get into the 172B context models you will need either a huge bank of 4090/7900 like a mining machine to push out responses with speed/accuracy. But by then. Just rent that shit out to people and get paid for it. But I don't know many who have 40-50K to drop on a server wide AI machine.