That's because these models are what they call LLM's or large language models they load entirely into the gpu, the gpu is extreamly good at proccessing llm's and even designed to do just that with newer graphics cards, cpu's are extreamly slow and normal ram (ddr) is super slow so a gpu is often how its done. gddr is super fast and its placed around the gpu chip so even travel time is really low provinding great latency to performance leads to quick replys or photo processing on stable diffusion.
a lot of these models that are uncencered will work just like gpt 4 but well with out the cencer so you could ask it all kinds of real world questions on stuff and get a reply, we might be using it for sexy chat time but they really are powerful tools, they can code for you too.
If you wanted to run models for other then chatting then i would encurage you to try oobabooga