Oh, you are also considering interrogation and exclamation marks. I assumed that you would treat them differently, but it makes sense to group them with the numbers. The only case I don't really feel it is with "currency", as I cannot think of any phrase translating cost value to currency (e.g. "5$" to "5 of the currency"), though consider also adding the yen, (maybe euro?) symbols to the origstuff and coins, silver to shistuff.
I guess I still didn't fully get across what I mean.
I'm even considering just adding words. So that if the original has one of the words ペニス, 陰茎, ディック, or チンポ , it does not accept a translation that does not include penis, cock, dick, phallus, prick, stick, rod, member, tool, thing, wand, sword, weapon, finger, spear, lance, or meat.
And I made that example list in like 2 minutes I didn't seriously think about it, yet.
The main goal would be to identify when the LLM put a translation in the wrong line, or when the LLM didn't faithfully translate, but just decided to leave stuff out.
Edit: But the more I think about it, that should just trigger a single line correction attempt of an accepted full batch, because if a LLM refuses to include that in a direct single line request, it's also not going to do it on a retry.
And if it doesn't change it if asked about it specifically, then tag it purple.
Edit2: Maybe I should actually just stick to obvious stuff like numbers though considering how much of a mess Japanese is...