Tool Sugoi - a translate tool with offline AI-powered model to translate from Japanese; DeePL competitor

Ruffy2010 · Jun 27, 2024

revyfan said:
Try Translator ++, using Sugoi. Translator ++ is made to translate RPG maker games in a more efficient way.

I don't know which RPG MAKER this is. It is neither RPG Maker VX Ace nor RPG Maker MZ nor RPG Maker MV.

Gecy · Jun 27, 2024

Ruffy2010 said:
I don't know which RPG MAKER this is. It is neither RPG Maker VX Ace nor RPG Maker MZ nor RPG Maker MV.

All the text stored in \spt\ in simple txt files, only with different extension (spt). Just copy those files somewhere, change spt to txt and open with Notepad++ or something that can figure encoding automatically. What you need is the text in them that goes after msgt (not all files have them). Maybe there's some other spots you'll have to translate, but you can figure it yourself. You also can translate txt files with Sugoi (notice, that it can deal only with simple txt saved in urf-8), then make sure that non-msgt Japanese text stayed the same, and then change txt to spt.
Translator ++ probably can extract relevant text automatically, but if it can't, then you can just edit everything manually. Though it would be a pain in the ass without some sort of automation.

Ruffy2010 · Jun 27, 2024

I have converted an SPT file to TXT to see if I can open it with Translator++, but nothing is displayed. I wanted to do an automatic translation to German and play it through to understand the context better and then correct it manually later. Anyway, I'll leave it, thanks for your help though.

Cjay · Jun 30, 2024

Gecy said:
Sugoi_Toolkit_v8.0

You must be registered to see the links

For preservation.
And also modified menu for those who don't want to see the site every time (file with the original window included).

Can you reupload this on other sites like gofiles or mega? Pixeldrain is blocked in my country

tysm

Entai2965 · Jun 30, 2024

Cjay said:
Can you reupload this on other sites like gofiles or mega? Pixeldrain is blocked in my country tysm

You must be registered to see the links

Ffreaker1025 · Jul 14, 2024

it's happening , a new version is coming out on the 19.

You must be registered to see the links

revyfan · Jul 15, 2024

Ffreaker1025 said:
it's happening , a new version is coming out on the 19.
You must be registered to see the links

Here's hoping its an actual updated model and not just the bloatware that barely works (For me, anyways)

Entai2965 · Jul 15, 2024

revyfan said:
Here's hoping its an actual updated model and not just the bloatware that barely works (For me, anyways)

That is unlikely to happen, ever. The developer already stated that there is not really a way to improve the current model further. It has run its course in terms of technological development and there would be, at best, only incremental updates possible now which makes it not really worth the GPU time to retrain the model. Hence, there is no reason for the developer to bother improving it anymore.

If you want better translations than what can Sugoi can produce now, that requires either adding dictionaries to hardcode specific translations like SLR Translator does which fixes a lot of quirks that Sugoi has, but leaves the subjects messed up of course like all NMTs, or just go ahead and use the superior technology of LLMs instead. The successor technology to the NMT technology used the Sugoi Toolkit is Large Language Models (LLMs), so if you want any practical improvements over Sugoi, then look into using AI translations instead.

I did a comparison between Sugoi, DeepL, Mixtral8x7b. The results were that Sugoi is better than LLMs without context, but with context, LLMs are better at the cost of significantly increased computational time and reduced automation. For the minimal computation time that it has, the Sugoi NMT model included Sugoi Offline Translator v4, included in Sugoi since Sugoi Toolkit v6, is the best quality realistically possible for any JPN->ENG NMT model.

eskelet · Jul 15, 2024

Entai2965 said:
That is unlikely to happen, ever. The developer already stated that there is not really a way to improve the current model further. It has run its course in terms of technological development and there would be, at best, only incremental updates possible now which makes it not really worth the GPU time to retrain the model. Hence, there is no reason for the developer to bother improving it anymore.

If you want better translations than what can Sugoi can produce now, that requires either adding dictionaries to hardcode specific translations like SLR Translator does which fixes a lot of quirks that Sugoi has, but leaves the subjects messed up of course like all NMTs, or just go ahead and use the superior technology of LLMs instead. The successor technology to the NMT technology used the Sugoi Toolkit is Large Language Models (LLMs), so if you want any practical improvements over Sugoi, then look into using AI translations instead.

I did a comparison between Sugoi, DeepL, Mixtral8x7b. The results were that Sugoi is better than LLMs without context, but with context, LLMs are better at the cost of significantly increased computational time and reduced automation. For the minimal computation time that it has, the Sugoi NMT model included Sugoi Offline Translator v4, included in Sugoi since Sugoi Toolkit v6, is the best quality realistically possible for any JPN->ENG NMT model.

Mixtral is pretty 'old' at this point. Try this one

You must be registered to see the links

Supposedly better and smaller than mixtral.

revyfan · Jul 15, 2024

Entai2965 said:
That is unlikely to happen, ever. The developer already stated that there is not really a way to improve the current model further.

Figured, I just wanted to believe... I guess the only other thing I hope is that he improves his OCR (Also, probably unlikely)

Entai2965 · Jul 15, 2024

eskelet said:
Mixtral is pretty 'old' at this point.

You must be registered to see the links

According to the leaderboard that ranks JPN VN -> ENG TL as of today mid-July 2024:

rank	model	Accuracy
1	openai/gpt-4o-2024-05-13	0.747988
2	anthropic/claude-3.5-sonnet	0.747447
4	nvidia/nemotron-4-340b-instruct	0.719268
5	lmg-anon/vntl-gemma2-27b_q5_k_m	0.703626
6	qwen/qwen-2-72b-instruct	0.696493
7	openai/gpt-3.5-turbo-1106	0.694348
8	lmg-anon/vntl-llama3-8b-q8_0	0.68871
9	google/gemma-2-27b-it_Q5_K_M	0.68277
11	mistralai/mixtral-8x22b-instruct	0.678332
12	cohere/command-r-plus	0.674124
18	meta-llama/llama-3-70b-instruct_Q4_K_M	0.658825
25	meta-llama/llama-3-70b-instruct	0.63304
28	mistralai/mixtral-8x7b-instruct	0.616399
31	meta-llama/llama-3-8b-instruct_Q8_0	0.604868
32	cohere/command-r	0.601418
-	Sugoi Translator	0.6093
-	Google Translate	0.5395
-	Naver Papago	0.4560
-	Alibaba Translate	0.4089

The above results are not entirely believable. There is no way the quantized version of llama3-70b-instruct should perform better than the cloud version which makes me question the validity of the test.

In addition, the dataset used to train the models, and the test itself also included a lot of Kanji names. There is no way to correctly translate those without the person saying how their name should be said in the text. Since the vntl dataset includes a lot of those hardcoded mappings, if the test checks for them and considers them as part of the ranking, then the results are basically cheating and boosting the vntl models higher than they truthfully belong.

Still, it is an interesting leaderboard. If the results are taken at face value, vntl-gemma2-27b should be better than the llama3-8b version. And as I said earlier, and as my results showed, the difference between Sugoi and LLMs, especially without context is not very large. Sugoi holds up well given its limitations.

revyfan said:
Figured, I just wanted to believe... I guess the only other thing I hope is that he improves his OCR (Also, probably unlikely)

The developer has a

You must be registered to see the links

(scroll down of the link) if you want to follow along with the development of the toolkit. There was something mentioned about improving OCR, but I do not care about OCR much right now, so I did not read it very closely.

eskelet · Jul 15, 2024

Entai2965 said:
The above results are not entirely believable. There is no way the quantized version of llama3-70b-instruct should perform better than the cloud version which makes me question the validity of the test.

Quantization is a bit weird in a sense that it introduces more noise so to speak. It may have just tipped the scale in one particular run. Or it's just some quirk of llama.cpp since tokenizer may not be exactly one to one.

Entai2965 said:
In addition, the dataset used to train the models, and the test itself also included a lot of Kanji names. There is no way to correctly translate those without the person saying how their name should be said in the text. Since the vntl dataset includes a lot of those hardcoded mappings, if the test checks for them and considers them as part of the ranking, then the results are basically cheating and boosting the vntl models higher than they truthfully belong.

I didn't really check the methodology behind testing, but isn't evaluation set different from the training one?

Entai2965 said:
Still, it is an interesting leaderboard. If the results are taken at face value, vntl-gemma2-27b should be better than the llama3-8b version. And as I said earlier, and as my results showed, the difference between Sugoi and LLMs, especially without context is not very large. Sugoi holds up well given its limitations.

I heard that the author behind the fine tunes doesn't really recommend gemma over llama 8b. I guess it makes sense because it's totally feasible to fit llama entirely in your gpu for blazing fast translation, while gemma is quite big and isn't that much better. (Plus google fudged something up with gemma2 release, no one knows what's up with it really.)

I was thinking of whipping up some sort of gui that would work with llama.cpp's server. I think it would be much cleaner solution than sugoi's 12 gigs of python env bloat. Llama 8b at q4_k_m (5gb) + server (70mb) + you could enable or disable gpu accelaration without hassle (or even use partial gpu accel.)

Gecy · Jul 15, 2024

eskelet said:
I was thinking of whipping up some sort of gui that would work with llama.cpp's server.

Shouldn't

You must be registered to see the links

already do this? I haven't tried anything yet, only read, so I don't know.

eskelet · Jul 16, 2024

Gecy said:
Shouldn't
You must be registered to see the links
already do this? I haven't tried anything yet, only read, so I don't know.

Kobold is a chat frontend/launcher for llama.cpp. I meant something like sugoi's interface where you copy input into your clipboard and it sends it for translation.

Ffreaker1025 · Jul 20, 2024

it's out

You must be registered to see the links

Entai2965 · Jul 20, 2024

Here are the release notes:

You must be registered to see the links

For the offline model:
"Sugoi Offline Model is now using CT2 package by default, replacing previous fairseq library. Accuracy is about the same while CPU processing speed is twice as fast (even more so when enabling GPU)."

BoohooBitch · Jul 21, 2024

Entai2965 said:
Here are the release notes:
You must be registered to see the links

For the offline model:
"Sugoi Offline Model is now using CT2 package by default, replacing previous fairseq library. Accuracy is about the same while CPU processing speed is twice as fast (even more so when enabling GPU)."

can we use sugio to translate offline text file? Can you please drop your thoughts on my TL question thread

https://f95zone.to/threads/best-opt...ere-offline-ai-modals-that-we-can-use.217624/

Gecy · Jul 21, 2024

BoohooBitch said:
can we use sugio to translate offline text file?

It can translate txt files, but it's better to copy lines into spreadsheet (Excel and the likes) and then use Sugoi with Translator++. With enough effort and ingenuity you can translate even those games that T++ can't parse on its own.

RazedToThrill · Jul 24, 2024

Ffreaker1025 said:
it's out
You must be registered to see the links

So it should be about the same speed as FuwaNovel's repackage of the tool into CTrans2?

Entai2965 · Jul 24, 2024

RazedToThrill said:
So it should be about the same speed as FuwaNovel's repackage of the tool into CTrans2?

Yes, minus cache of course.

Tool Sugoi - a translate tool with offline AI-powered model to translate from Japanese; DeePL competitor

Member

Newbie

Member

New Member

Member

Member

Newbie

Member

Newbie

Newbie

Member

Newbie

Newbie

Newbie

Member

Member

Member

Newbie

Member

Member