I have some pretty damn good news for someone struggling with running DSLR.
There's a new wave of uncensored qwen instruct models that make models that are actually so impressive at following orders it beats other 32B model variants at 8B, making it run on just about anything.
Qwen3 VL 32B Instruct abliterated v1 I1
for someone with a recent PC (I don't have the hardware to test it a lot, but during my limited testing I have not seen it go into an error loop at all, yet) and
Qwen3 VL 8B Instruct abliterated v2.0 I1
For someone who basically owns "any" CUDA or Vulcan capable GPU.
You can fully offload the Q4_K_S variant on a 1060. (It's seriously only 4.8GB.)
Will the 8B version still occasionally shit the bed and get stuck in an error loop? Yes, but stuff like Gemma3 does that at 27B too, and it failing, and you having to remove some problematic cell from the batch before redoing it, hurts quite a bit less if it's running at like 200 characters per second on a 9 year old GPU.
Because unlike my previous recommendation for offline DSLR this isn't a thinking model. It doesn't waste 5 years each response writing a novel about why something might be translated to "big throbbing cock".
For maximum speed and laziness you could also use the 8B model until it hits an error loop and then just switch to the 32b model for that batch and then go back.
Heck if there's demand I could even automate that, have an automatic fallback model.

I mean since it's offline and free it's not like you would need to worry about token usage.
And as if that wasn't enough, yet, It's a VL model, it has vision. You can use it for SEP picture translations, too. (Although I wouldn't use the 8B variant for that, that probably wouldn't be a meaningful step up from just using paddle.)
I "assume" the thinking model variants would fail slightly less, but I can't imagine the immense speed difference being worth it.
Even premium stuff like GPT or DeepSeek will get stuck in error loops sometimes, you can't really fully avoid that, yet.