Just to make people aware, get some opinions and stuff, I'm not sure how long this is still going to take, I'm working on a substantial change to SLR translation.
You may have noticed annoying SUGOIERROR or UNUSUALLYSHORT errors. Those are generally caused by Sugoi crashing and loosing parts of the input.
To prevent that I've built this large dictionary of pre-defined translations meant to prevent things being fed to Sugoi that crash it.
But the downside of that is that parts of the text sometimes feel very repetitive, generic or even out of place, because obviously my dictionary doesn't consider context and the entries are applied somewhat broader than they have to be, because I rather have slightly worse translation than no translation.
So currently it works like this:
Text found in the dictionary gets removed from the cell before the rest gets send to SugoiV4, if the output does not look like a viable translation even after applying all the patterns to fix issues with Sugoi, it places an error code.
And the plan instead now is:
Send all text without considering dictionary entries to the SugoiV4 model.
If the output fails the checks for a viable translation, send it to the Sugoi Levi model instead.
If that output fails the checks as well, apply all dictionary translations and send the rest to the SugoiV4 model again.
If that output still fails the checks, apply all dictionary translations and send the rest to the Sugoi Levi model again.
If that output still fails the check insert an error code.
That way context will be lost much less often, translations will feel much more varied and accurate, and you will see much less error codes.
Obvious downside is that it makes translation slower and you would need to have both models.
Although the average translation would only have like 20 cases, in which it would actually need to re-translate something.
Edit: The main obstacle so far is to actually accurately determine if a translation is viable or not.
I will have to do A LOT of testing.