I'm running on a 3090, using GPTQ. I have to use flashattn when running the 22B with 32k context though. It doesn't play well with whatever instructs he's feeding it anyways, replies are extremely wordy and rigid usually starting with "Oh I see you're trying to know more about me!" and typical AI garble like that.22B parameters are insane! What GPU do you use to run that? Also what loaders do you use? AWQ or GPTQ?