Tool Sugoi - a translate tool with offline AI-powered model to translate from Japanese; DeePL competitor

revyfan

Newbie
Jan 26, 2018
62
40
I'm new to automatic translation and unfortunately I don't know my way around at all ;/ I want to play this game "Pegasus Knight X II", it's in Japanese, what do I need to translate it?

Thank you very much!
Try Translator ++, using Sugoi. Translator ++ is made to translate RPG maker games in a more efficient way.
 
  • Like
Reactions: Ruffy2010

Gecy

Newbie
Apr 30, 2020
72
106
I don't know which RPG MAKER this is. It is neither RPG Maker VX Ace nor RPG Maker MZ nor RPG Maker MV.
All the text stored in \spt\ in simple txt files, only with different extension (spt). Just copy those files somewhere, change spt to txt and open with Notepad++ or something that can figure encoding automatically. What you need is the text in them that goes after msgt (not all files have them). Maybe there's some other spots you'll have to translate, but you can figure it yourself. You also can translate txt files with Sugoi (notice, that it can deal only with simple txt saved in urf-8), then make sure that non-msgt Japanese text stayed the same, and then change txt to spt.
Translator ++ probably can extract relevant text automatically, but if it can't, then you can just edit everything manually. Though it would be a pain in the ass without some sort of automation.
 
  • Like
Reactions: Ruffy2010

Ruffy2010

Member
Jul 9, 2017
399
180
I have converted an SPT file to TXT to see if I can open it with Translator++, but nothing is displayed. I wanted to do an automatic translation to German and play it through to understand the context better and then correct it manually later. Anyway, I'll leave it, thanks for your help though.
 

Cjay

New Member
Oct 18, 2017
6
2
Sugoi_Toolkit_v8.0



For preservation.
And also modified menu for those who don't want to see the site every time (file with the original window included).
Can you reupload this on other sites like gofiles or mega? Pixeldrain is blocked in my country :( tysm
 

Entai2965

Member
Jan 12, 2020
149
422
Here's hoping its an actual updated model and not just the bloatware that barely works (For me, anyways)
That is unlikely to happen, ever. The developer already stated that there is not really a way to improve the current model further. It has run its course in terms of technological development and there would be, at best, only incremental updates possible now which makes it not really worth the GPU time to retrain the model. Hence, there is no reason for the developer to bother improving it anymore.

If you want better translations than what can Sugoi can produce now, that requires either adding dictionaries to hardcode specific translations like SLR Translator does which fixes a lot of quirks that Sugoi has, but leaves the subjects messed up of course like all NMTs, or just go ahead and use the superior technology of LLMs instead. The successor technology to the NMT technology used the Sugoi Toolkit is Large Language Models (LLMs), so if you want any practical improvements over Sugoi, then look into using AI translations instead.

I did a comparison between Sugoi, DeepL, Mixtral8x7b. The results were that Sugoi is better than LLMs without context, but with context, LLMs are better at the cost of significantly increased computational time and reduced automation. For the minimal computation time that it has, the Sugoi NMT model included Sugoi Offline Translator v4, included in Sugoi since Sugoi Toolkit v6, is the best quality realistically possible for any JPN->ENG NMT model.
 

eskelet

Newbie
Aug 2, 2018
57
103
That is unlikely to happen, ever. The developer already stated that there is not really a way to improve the current model further. It has run its course in terms of technological development and there would be, at best, only incremental updates possible now which makes it not really worth the GPU time to retrain the model. Hence, there is no reason for the developer to bother improving it anymore.

If you want better translations than what can Sugoi can produce now, that requires either adding dictionaries to hardcode specific translations like SLR Translator does which fixes a lot of quirks that Sugoi has, but leaves the subjects messed up of course like all NMTs, or just go ahead and use the superior technology of LLMs instead. The successor technology to the NMT technology used the Sugoi Toolkit is Large Language Models (LLMs), so if you want any practical improvements over Sugoi, then look into using AI translations instead.

I did a comparison between Sugoi, DeepL, Mixtral8x7b. The results were that Sugoi is better than LLMs without context, but with context, LLMs are better at the cost of significantly increased computational time and reduced automation. For the minimal computation time that it has, the Sugoi NMT model included Sugoi Offline Translator v4, included in Sugoi since Sugoi Toolkit v6, is the best quality realistically possible for any JPN->ENG NMT model.
Mixtral is pretty 'old' at this point. Try this one
Supposedly better and smaller than mixtral.
 
  • Like
Reactions: Entai2965

revyfan

Newbie
Jan 26, 2018
62
40
That is unlikely to happen, ever. The developer already stated that there is not really a way to improve the current model further.
Figured, I just wanted to believe... I guess the only other thing I hope is that he improves his OCR (Also, probably unlikely)
 

Entai2965

Member
Jan 12, 2020
149
422
Mixtral is pretty 'old' at this point.

According to the leaderboard that ranks JPN VN -> ENG TL as of today mid-July 2024:

rankmodelAccuracy
1openai/gpt-4o-2024-05-130.747988
2anthropic/claude-3.5-sonnet0.747447
4nvidia/nemotron-4-340b-instruct0.719268
5lmg-anon/vntl-gemma2-27b_q5_k_m0.703626
6qwen/qwen-2-72b-instruct0.696493
7openai/gpt-3.5-turbo-11060.694348
8lmg-anon/vntl-llama3-8b-q8_00.68871
9google/gemma-2-27b-it_Q5_K_M0.68277
11mistralai/mixtral-8x22b-instruct0.678332
12cohere/command-r-plus0.674124
18meta-llama/llama-3-70b-instruct_Q4_K_M0.658825
25meta-llama/llama-3-70b-instruct0.63304
28mistralai/mixtral-8x7b-instruct0.616399
31meta-llama/llama-3-8b-instruct_Q8_00.604868
32cohere/command-r0.601418
-Sugoi Translator0.6093
-Google Translate0.5395
-Naver Papago0.4560
-Alibaba Translate0.4089

The above results are not entirely believable. There is no way the quantized version of llama3-70b-instruct should perform better than the cloud version which makes me question the validity of the test.

In addition, the dataset used to train the models, and the test itself also included a lot of Kanji names. There is no way to correctly translate those without the person saying how their name should be said in the text. Since the vntl dataset includes a lot of those hardcoded mappings, if the test checks for them and considers them as part of the ranking, then the results are basically cheating and boosting the vntl models higher than they truthfully belong.

Still, it is an interesting leaderboard. If the results are taken at face value, vntl-gemma2-27b should be better than the llama3-8b version. And as I said earlier, and as my results showed, the difference between Sugoi and LLMs, especially without context is not very large. Sugoi holds up well given its limitations.

Figured, I just wanted to believe... I guess the only other thing I hope is that he improves his OCR (Also, probably unlikely)
The developer has a (scroll down of the link) if you want to follow along with the development of the toolkit. There was something mentioned about improving OCR, but I do not care about OCR much right now, so I did not read it very closely.
 
Last edited:

eskelet

Newbie
Aug 2, 2018
57
103
The above results are not entirely believable. There is no way the quantized version of llama3-70b-instruct should perform better than the cloud version which makes me question the validity of the test.
Quantization is a bit weird in a sense that it introduces more noise so to speak. It may have just tipped the scale in one particular run. Or it's just some quirk of llama.cpp since tokenizer may not be exactly one to one.
In addition, the dataset used to train the models, and the test itself also included a lot of Kanji names. There is no way to correctly translate those without the person saying how their name should be said in the text. Since the vntl dataset includes a lot of those hardcoded mappings, if the test checks for them and considers them as part of the ranking, then the results are basically cheating and boosting the vntl models higher than they truthfully belong.
I didn't really check the methodology behind testing, but isn't evaluation set different from the training one?
Still, it is an interesting leaderboard. If the results are taken at face value, vntl-gemma2-27b should be better than the llama3-8b version. And as I said earlier, and as my results showed, the difference between Sugoi and LLMs, especially without context is not very large. Sugoi holds up well given its limitations.
I heard that the author behind the fine tunes doesn't really recommend gemma over llama 8b. I guess it makes sense because it's totally feasible to fit llama entirely in your gpu for blazing fast translation, while gemma is quite big and isn't that much better. (Plus google fudged something up with gemma2 release, no one knows what's up with it really.)

I was thinking of whipping up some sort of gui that would work with llama.cpp's server. I think it would be much cleaner solution than sugoi's 12 gigs of python env bloat. Llama 8b at q4_k_m (5gb) + server (70mb) + you could enable or disable gpu accelaration without hassle (or even use partial gpu accel.)
 

Gecy

Newbie
Apr 30, 2020
72
106
I was thinking of whipping up some sort of gui that would work with llama.cpp's server.
Shouldn't already do this? I haven't tried anything yet, only read, so I don't know.
 

eskelet

Newbie
Aug 2, 2018
57
103
Shouldn't already do this? I haven't tried anything yet, only read, so I don't know.
Kobold is a chat frontend/launcher for llama.cpp. I meant something like sugoi's interface where you copy input into your clipboard and it sends it for translation.
 

Entai2965

Member
Jan 12, 2020
149
422
Here are the release notes:

For the offline model:
"Sugoi Offline Model is now using CT2 package by default, replacing previous fairseq library. Accuracy is about the same while CPU processing speed is twice as fast (even more so when enabling GPU)."
 
  • Like
Reactions: RazedToThrill

BoohooBitch

Member
Oct 30, 2017
394
293
Here are the release notes:

For the offline model:
"Sugoi Offline Model is now using CT2 package by default, replacing previous fairseq library. Accuracy is about the same while CPU processing speed is twice as fast (even more so when enabling GPU)."
can we use sugio to translate offline text file? Can you please drop your thoughts on my TL question thread :)

https://f95zone.to/threads/best-opt...ere-offline-ai-modals-that-we-can-use.217624/
 

Gecy

Newbie
Apr 30, 2020
72
106
can we use sugio to translate offline text file?
It can translate txt files, but it's better to copy lines into spreadsheet (Excel and the likes) and then use Sugoi with Translator++. With enough effort and ingenuity you can translate even those games that T++ can't parse on its own.