Tool RPGM DazedMTLTool - A tool that provides quality MTL translations using ChatGPT

MuninKai

Well-Known Member
Apr 29, 2020
1,634
2,654
Have you looked into bypassing the API and webscraping the ChatGPT interface to cut down on cost?

Even saying that, $14 sounds pretty reasonable given that manually cleaning up DeepL translation of a large CommonEvent file could be as much as 16-20h of work. Playing your Sol/Rui After Mini translation I'm pretty impressed with ChatGPT's work. There are a few errors and weird phrasing, but mostly very understandable.

Looking through your code, you will encounter problems on more complex games. Firstly, you'll be missing all the displayed text that can be stored in plugins.js, and in nightmare scenarios, the plugins themselves. Most games include at least a few configuration/menu settings in the plugins.js file, but more complex features like quest guides and some custom erotic status menus also rely on it heavily. I see you commented out the 355 code, and probably for the best, that will be another place where things are fragile but important, and would require more processing by you (I don't know if ChatGPT can handle non-coding spaces, but they are sometimes important).

Another piece of advice for the code is to reduce the number of prompts you need to give to ChatGPT by cleaning up the RPGM code yourself a little. Replace \\I[38] with [i38] for example, then telling chatGPT to ignore anything in []. That way you can also handle colours \\C[4] to [c4], variables, etc. This would make it more robust for other users as well and require less intervention by the translator. After you get the translated text just restore the [i38] to \\I[38].
 
Last edited:

dazedanon

Engaged Member
Modder
Uploader
Donor
Jul 24, 2017
2,438
26,891
Have you looked into bypassing the API and webscraping the ChatGPT interface to cut down on cost?

Even saying that, $14 sounds pretty reasonable given that manually cleaning up DeepL translation of a large CommonEvent file could be as much as 16-20h of work. Playing your Sol/Rui After Mini translation I'm pretty impressed with ChatGPT's work. There are a few errors and weird phrasing, but mostly very understandable.

Looking through your code, you will encounter problems on more complex games. Firstly, you'll be missing all the displayed text that can be stored in plugins.js, and in nightmare scenarios, the plugins themselves. Most games include at least a few configuration/menu settings in the plugins.js file, but more complex features like quest guides and some custom erotic status menus also rely on it heavily. I see you commented out the 355 code, and probably for the best, that will be another place where things are fragile but important, and would require more processing by you (I don't know if ChatGPT can handle non-coding spaces, but they are sometimes important).

Another piece of advice for the code is to reduce the number of prompts you need to give to ChatGPT by cleaning up the RPGM code yourself a little. Replace \\I[38] with [i38] for example, then telling chatGPT to ignore anything in []. That way you can also handle colours \\C[4] to [c4], variables, etc. This would make it more robust for other users as well and require less intervention by the translator. After you get the translated text just restore the [i38] to \\I[38].
Thanks for the great advice! You hit a lot of pains I've been thinking how to solve haha.

Text in plugins.js ive deemed impossible for now since they almost always change radically game to game. Better to hand tranalate that stuff.

For 355, I need to be very selective on what I touch there because as you said it can be very easy to break things. I found success with it in SolRui after I made the processing very strict but commented out for this next patch I'm working on.

That color solution is a great idea! I think I'll put that on the list for thing to implement.

I've thought of webscraping but I need control over certain under the hood variables such as temperature to reduce any randomness in the translation. + im not sure how well the system prompt would work in the web version or how robust there anti scraping security is on there. So for now I've just been using the API. Definitely something to consider in the future.
 

MuninKai

Well-Known Member
Apr 29, 2020
1,634
2,654
Thanks for the great advice! You hit a lot of pains I've been thinking how to solve haha.

Text in plugins.js ive deemed impossible for now since they almost always change radically game to game. Better to hand tranalate that stuff.

For 355, I need to be very selective on what I touch there because as you said it can be very easy to break things. I found success with it in SolRui after I made the processing very strict but commented out for this next patch I'm working on.

That color solution is a great idea! I think I'll put that on the list for thing to implement.

I've thought of webscraping but I need control over certain under the hood variables such as temperature to reduce any randomness in the translation. + im not sure how well the system prompt would work in the web version or how robust there anti scraping security is on there. So for now I've just been using the API. Definitely something to consider in the future.
Yeah, I've been struggling to figure it out myself. There is some low hanging fruit that can be accessed by converting the plugins.js file into a dictionary by pruning the starting text then using the inbuilt function for converting strings into dictionaries in python. Then you look for key:value pairs that are just plain Japanese text. So far it has only caused me one issue. I'm working on a parsing formula for the more complex ones as well but that's been my hack so far.

Encountered the same thing with 355. One thing I found helpful was to have a separate library for 355 commands that I've encountered in other games as a separate file, then load in all the rules into my translation process. Keeps the code clean but makes it only translate specific 355s that you know have a function and can be translated safely.

Good to know the webscrape would have lower control over the output. One of the things I love about DeepL is the absolutely useless attempt they've made at anti-bot protection.
 

dazedanon

Engaged Member
Modder
Uploader
Donor
Jul 24, 2017
2,438
26,891
Encountered the same thing with 355. One thing I found helpful was to have a separate library for 355 commands that I've encountered in other games as a separate file, then load in all the rules into my translation process. Keeps the code clean but makes it only translate specific 355s that you know have a function and can be translated safely.

Huh thats a pretty nifty way of handling that.
 
  • Like
Reactions: hebizaihu

Mitsuna

Active Member
Jun 21, 2019
532
698
Not bad, but how close is the translation to the actual script? This is what annoyed me the most about "human" translations. And MTL, well, never could keep the terms and narration of the original.
 
  • Like
Reactions: Amariithynar

stevesssss

New Member
Nov 26, 2021
12
103
Could you do it with meta's LLAMA that got leaked not long ago? Could run it for free off your PC if you have a few hundred gigs of storage free
 

dazedanon

Engaged Member
Modder
Uploader
Donor
Jul 24, 2017
2,438
26,891
Tested it and the results were fairly poor. It pretty much got everything I sent it wrong, probably because it isn't trained in Japanese.
 

revyfan

Newbie
Jan 26, 2018
62
40
Not bad, but how close is the translation to the actual script? This is what annoyed me the most about "human" translations. And MTL, well, never could keep the terms and narration of the original.
Obviously i'm not the OP but as my tests have been with GPT Turbo, it gets things almost spot on with only a few errors here and there.

Here's a great example

1681353336344.png

And here's the translation from a well trusted translator

1681353518296.png

As you can see, Turbo did pretty well with it, even without any other context, it managed to give a translation that's both correct and readable with one of the only issues is it using "Heath Insurance Class" instead of Physical Education/PE and not keeping honorifics (Which you could control by telling the bot to keep them in english, I was just too lazy to do it) and as i've stated, this is only Turbo and not the newest model they have (Which requires a payment to gain access) which is far better.
 

Yuu77

New Member
May 30, 2021
1
0
Any particular reason why you picked History count at 20? I know Japanese and i don't think i need that much context for translation in general, mostly around 5?
Would that decrease the quality of the translation by a large margin and i'm wondering how much cost that could cut?
ChatGPT is not available at my country, it just doesn't accept my creditcard and i don't want to go to the hassle to setup a payment method just to test this lol
Also i downloaded Kakuriyo and looked at CommonEvents.json, there're still quite a lot of unstranslated text (i looked at line 211600+, not looked at them all yet obviously), does that not appear in the game or chatGPT refuse to translate those?
 

Mitsuna

Active Member
Jun 21, 2019
532
698
it gets things almost spot on
The thing is, if I cannot translate it, then automated translator will almost always fuck it up miserably, and there is also a problem with neural translators when they take random shit out of their pocket where it cannot be applied in any way. Not to mention, as a human, you can tell when text is stylized, but machine takes it very, very wrong.
 
  • Like
Reactions: dazedanon

Dawoodi

Newbie
Jan 17, 2018
64
377
This is amazing thank you so much for your work dazedanon

The games you translated are excellent and much more readable than normal MTL, but I did notice some problems there a lot of times where the Characters use wrong pronouns like saying "He" instead of "she" or referring to a girl as "that guy", and sometimes when Characters comments on something happing to another Character they start talking as if it's happing to them like a guy saying "my pussy got wet" when watching another girl get fucked, I wonder if there is a fix for these problems?:unsure:
 
  • Like
Reactions: dazedanon

dazedanon

Engaged Member
Modder
Uploader
Donor
Jul 24, 2017
2,438
26,891
This is amazing thank you so much for your work dazedanon

The games you translated are excellent and much more readable than normal MTL, but I did notice some problems there a lot of times where the Characters use wrong pronouns like saying "He" instead of "she" or referring to a girl as "that guy", and sometimes when Characters comments on something happing to another Character they start talking as if it's happing to them like a guy saying "my pussy got wet" when watching another girl get fucked, I wonder if there is a fix for these problems?:unsure:
Probably either better implementation of prompts and context, or an upgrade entirely like using GPT4 instead of 3. Hard to say since I need more testing.
 
  • Like
Reactions: Dawoodi

dazedanon

Engaged Member
Modder
Uploader
Donor
Jul 24, 2017
2,438
26,891
Any particular reason why you picked History count at 20? I know Japanese and i don't think i need that much context for translation in general, mostly around 5?
Would that decrease the quality of the translation by a large margin and i'm wondering how much cost that could cut?
ChatGPT is not available at my country, it just doesn't accept my creditcard and i don't want to go to the hassle to setup a payment method just to test this lol
Also i downloaded Kakuriyo and looked at CommonEvents.json, there're still quite a lot of unstranslated text (i looked at line 211600+, not looked at them all yet obviously), does that not appear in the game or chatGPT refuse to translate those?
Nothing specific, 20 sentences is the length of a normal conversation usually. From my testing the sweetspot is more 5 or 10 like you said.

I can tell what text needs to be translated based on the code # in rpgmaker games. (For example 401 is usually dialogue) Everything else I know not to touch.
 

xorxorrax

Member
Modder
Apr 12, 2020
283
1,316
Hey, nice job on the tool. I have used ChatGPT to translate a VN a few weeks ago and here is a suggestion regarding context. I have noticed it improves the translation quite well in most cases: Use a 2 pass system, first ask it to summarize the text in like 100-200 words giving it very general general clues like names/genre:

>> Sumarize the following text in around 150 english words. It is from a visual novel game. Ayato is the main character: <json>
<< The morning in the Hirose household always starts with the sound of a knife. Ayato, the main character, prepares breakfast for his father and himself. His father expresses guilt for making Ayato do all the housework and suggests he should consider remarrying for Ayato's happiness. Ayato reassures him that he is content with their current life. After his father leaves for work, Ayato finishes cleaning up and heads to school. On his way, he meets Rinko and her son Kota, who is fond of Ayato. A group of boys calls out to Kota, and Ayato watches them go.
Then give the next prompt as following:
I want you to act as a translator for japanese visual novel games. I'll write my source text, and you'll suggest a translation all in json. Format everything in json and keep the structure of the input. Prefer the use of active voice. Keep in mind that the main charecter is Hirose Ayato (広瀬 彩人). The morning in the Hirose household always starts with the sound of a knife. Ayato, the main character, prepares breakfast for his father and himself. His father expresses guilt for making Ayato do all the housework and suggests he should consider remarrying for Ayato's happiness. Ayato reassures him that he is content with their current life. After his father leaves for work, Ayato finishes cleaning up and heads to school. On his way, he meets Rinko and her son Kota, who is fond of Ayato. A group of boys calls out to Kota, and Ayato watches them go.
<json>
From my experience this fixes a lot of issues with the translation because it has a summary to guide it also if something is wrong the user can always add to or fix the summary to get a better translation in a 3rd pass.

If you want to discuss more about my experiments feel free to P.M. me.
 

dazedanon

Engaged Member
Modder
Uploader
Donor
Jul 24, 2017
2,438
26,891
Hey, nice job on the tool. I have used ChatGPT to translate a VN a few weeks ago and here is a suggestion regarding context. I have noticed it improves the translation quite well in most cases: Use a 2 pass system, first ask it to summarize the text in like 100-200 words giving it very general general clues like names/genre:


Then give the next prompt as following:

From my experience this fixes a lot of issues with the translation because it has a summary to guide it also if something is wrong the user can always add to or fix the summary to get a better translation in a 3rd pass.

If you want to discuss more about my experiments feel free to P.M. me.
Hmm this sounds like it would net a better translation but since I mainly test line by line translations it would probably be too expensive to be practical. The only way to have the API know the history/context is to send it in every request.

So I would end up having to send the 150 word summary along with the previous lines for context, and include the specific prompt I use for translations thousands of times which would drive up cost a lot.

I think this is only practical if you can have ChatGPT remember the summary without having to send it in every request, kind of like how playground currently works.
 

xorxorrax

Member
Modder
Apr 12, 2020
283
1,316
Hmm this sounds like it would net a better translation but since I mainly test line by line translations it would probably be too expensive to be practical. The only way to have the API know the history/context is to send it in every request.

So I would end up having to send the 150 word summary along with the previous lines for context, and include the specific prompt I use for translations thousands of times which would drive up cost a lot.

I think this is only practical if you can have ChatGPT remember the summary without having to send it in every request, kind of like how playground currently works.
In my case it didn't end up that expensive ~$10 for 2M characters, 50k lines.
My bigger problem with OpenAI in general is having to skirt around the content filters and are starting to show up. Really considering buying a 3090 and trying to train some of the lighter LLMs myself to just for fun.
 
  • Like
Reactions: Pango_12

xorxorrax

Member
Modder
Apr 12, 2020
283
1,316
dazedanon Ah, only now I understood that you translate line by line. I didn't do that, I translated them in 25 line batches.
Example of a batch:
JSON:
[
    {
        "n": "0",
        "id": "C00000000",
        "msg": "――広瀬家の朝は、いつも包丁の音から始まる。"
    },
    {
        "n": "1",
        "id": "C00000001",
        "msg": "「……そろそろ、お父さん起きてくるかな」"
    },
    {
        "n": "2",
        "id": "C00000002",
        "msg": "僕は、素早く仕上げを済ませてしまうと、それぞれの料理を皿に盛り付け、テーブルに配膳していく。"
    },
    {
        "n": "4",
        "id": "C00000004",
        "msg": "「おはよ」"
    },
    {
        "n": "5",
        "id": "C00000005",
        "msg": "「あ、お父さん、おはよう。あと、ご飯とお味噌汁よそったら準備できるから、椅子に座って待ってて」"
    },
    {
        "n": "6",
        "id": "C00000006",
        "msg": "僕は素早くキッチンで茶碗にご飯、お椀に味噌汁をよそいに行くと、父さんは席に着くなり――。"
    },
    {
        "n": "7",
        "id": "C00000007",
        "msg": "「……済まないな、いつもいつもお前にばっかり任せてしまって」"
    },
    {
        "n": "8",
        "id": "C00000008",
        "msg": "「どうしたの、急に?僕、料理も掃除洗濯も好きだし、全然苦じゃないよ」"
    },
    {
        "n": "9",
        "id": "C00000009",
        "msg": "「お父さんだって、男手一つで僕を育ててくれてるし、毎日お仕事頑張ってきてくれるんだし、これぐらいは手伝わないと」"
    },
    {
        "n": "10",
        "id": "C0000000A",
        "msg": "「……しかしな。お前だって、友達と思いっきり遊びたい年頃だろ。それを俺が奪ってしまっていると思うと、申し訳なくってな」"
    },
    {
        "n": "11",
        "id": "C0000000B",
        "msg": "「……今は、忙しくて難しいが、やはり本気で再婚を考えた方がいいな。お前の幸せの為にも」"
    },
    {
        "n": "12",
        "id": "C0000000C",
        "msg": "「そんな思い詰めないでよ。僕は、お父さんと二人だけの生活に不満なんて全然ないよ」"
    },
    {
        "n": "14",
        "id": "C0000000E",
        "msg": "「うぅ……」"
    },
    {
        "n": "15",
        "id": "C0000000F",
        "msg": "僕の言葉にお父さんは目頭を抑え、泣きそうになっている。"
    },
    {
        "n": "16",
        "id": "C00000010",
        "msg": "最近、ずっとこうだ。"
    },
    {
        "n": "17",
        "id": "C00000011",
        "msg": "数年前にお母さんが病気で亡くなった時からしばらくは、僕を育てる事に一生懸命で、気丈に振舞っていたけど……。"
    },
    {
        "n": "18",
        "id": "C00000012",
        "msg": "ここ、二、三年で僕が毎日食事などの家事を全部率先してするようになってから、気持ちが緩んだのか涙脆くなった。"
    },
    {
        "n": "19",
        "id": "C00000013",
        "msg": "「ほら、お父さん、早く食べちゃわないと、いつもの電車に乗り遅れちゃうよ」"
    },
    {
        "n": "20",
        "id": "C00000014",
        "msg": "「う、うん、そうだな……いただきます」"
    },
    {
        "n": "21",
        "id": "C00000015",
        "msg": "お父さんは、うっすらと出た涙を手で拭くと、朝食をとっていく。"
    },
    {
        "n": "22",
        "id": "C00000016",
        "msg": "その様子に内心安堵しながら席に着き、僕も朝食をとっていくのだった。"
    },
    {
        "n": "23",
        "id": "C00000017",
        "msg": "………………。"
    },
    {
        "n": "24",
        "id": "C00000018",
        "msg": "「じゃあ、行ってくる。後の戸締りは頼むな」"
    },
]
Output (here my script discards the "n" and joins the original and translated so its not exactly the chat gpt output):
JSON:
[
    {
        "id": "C00000000",
        "o": "――広瀬家の朝は、いつも包丁の音から始まる。",
        "t": "The morning in the Hirose household always starts with the sound of a knife."
    },
    {
        "id": "C00000001",
        "o": "「……そろそろ、お父さん起きてくるかな」",
        "t": "\"...I wonder if dad will wake up soon.\""
    },
    {
        "id": "C00000002",
        "o": "僕は、素早く仕上げを済ませてしまうと、それぞれの料理を皿に盛り付け、テーブルに配膳していく。",
        "t": "I quickly finish up and plate each dish, then bring them to the table. "
    },
    {
        "id": "C00000004",
        "o": "「おはよ」",
        "t": "\"Good morning.\""
    },
    {
        "id": "C00000005",
        "o": "「あ、お父さん、おはよう。あと、ご飯とお味噌汁よそったら準備できるから、椅子に座って待ってて」",
        "t": "\"Oh, good morning dad. Once I serve the rice and miso soup, everything will be ready. Please sit down and wait.\""
    },
    {
        "id": "C00000006",
        "o": "僕は素早くキッチンで茶碗にご飯、お椀に味噌汁をよそいに行くと、父さんは席に着くなり――。",
        "t": "I quickly serve rice into a bowl and miso soup into a small dish, then dad sits down at the table."
    },
    {
        "id": "C00000007",
        "o": "「……済まないな、いつもいつもお前にばっかり任せてしまって」",
        "t": "\"...I'm sorry. I always leave everything to you.\""
    },
    {
        "id": "C00000008",
        "o": "「どうしたの、急に?僕、料理も掃除洗濯も好きだし、全然苦じゃないよ」",
        "t": "\"What's wrong? I like cooking, cleaning, and doing laundry. It's not a problem at all.\""
    },
    {
        "id": "C00000009",
        "o": "「お父さんだって、男手一つで僕を育ててくれてるし、毎日お仕事頑張ってきてくれるんだし、これぐらいは手伝わないと」",
        "t": "\"Dad, you raised me all by yourself and work hard every day. I should help out.\""
    },
    {
        "id": "C0000000A",
        "o": "「……しかしな。お前だって、友達と思いっきり遊びたい年頃だろ。それを俺が奪ってしまっていると思うと、申し訳なくってな」",
        "t": "\"...But still. You're at an age where you should be playing with your friends. I feel bad for taking that away from you.\""
    },
    {
        "id": "C0000000B",
        "o": "「……今は、忙しくて難しいが、やはり本気で再婚を考えた方がいいな。お前の幸せの為にも」",
        "t": "\"...It's difficult right now, but I think I should seriously consider remarrying for your happiness.\""
    },
    {
        "id": "C0000000C",
        "o": "「そんな思い詰めないでよ。僕は、お父さんと二人だけの生活に不満なんて全然ないよ」",
        "t": "\"Don't worry about it. I'm perfectly happy living with just you, dad.\""
    },
    {
        "id": "C0000000E",
        "o": "「うぅ……」",
        "t": "\"Ugh...\""
    },
    {
        "id": "C0000000F",
        "o": "僕の言葉にお父さんは目頭を抑え、泣きそうになっている。",
        "t": "Dad holds back tears as he listens to me."
    },
    {
        "id": "C00000010",
        "o": "最近、ずっとこうだ。",
        "t": "This has been happening a lot lately."
    },
    {
        "id": "C00000011",
        "o": "数年前にお母さんが病気で亡くなった時からしばらくは、僕を育てる事に一生懸命で、気丈に振舞っていたけど……。",
        "t": "After mom passed away from an illness a few years ago, dad did his best to raise me and put on a brave face..."
    },
    {
        "id": "C00000012",
        "o": "ここ、二、三年で僕が毎日食事などの家事を全部率先してするようになってから、気持ちが緩んだのか涙脆くなった。",
        "t": "But over the past two or three years, since I started doing all the housework every day, he's become more emotional."
    },
    {
        "id": "C00000013",
        "o": "「ほら、お父さん、早く食べちゃわないと、いつもの電車に乗り遅れちゃうよ」",
        "t": "\"Come on, dad. You'll miss your usual train if you don't hurry up and eat.\""
    },
    {
        "id": "C00000014",
        "o": "「う、うん、そうだな……いただきます」",
        "t": "\"R-Right...let's eat.\""
    },
    {
        "id": "C00000015",
        "o": "お父さんは、うっすらと出た涙を手で拭くと、朝食をとっていく。",
        "t": "Dad wipes away his tears and starts eating breakfast."
    },
    {
        "id": "C00000016",
        "o": "その様子に内心安堵しながら席に着き、僕も朝食をとっていくのだった。",
        "t": "I feel relieved watching him eat, and sit down to eat my own breakfast."
    },
    {
        "id": "C00000017",
        "o": "………………。",
        "t": "............"
    },
    {
        "id": "C00000018",
        "o": "「じゃあ、行ってくる。後の戸締りは頼むな」",
        "t": "\"Well, I'm off. Don't forget to lock up.\""
    }
]
If you give each line an ID and a sequential number chatGPT 95% of the time translates it fine. for the rest you can try again with a smaller batch size. If it keeps failing, save it for the user to do it manually later. That's how I kept costs so low.
 
  • Like
Reactions: Pango_12