- Jan 2, 2020
- 42
- 4
EDIT: As an update, this started with just AliceSoft games. Even if you don't like AliceSoft, or the genres on the games translated so far, I can promise a riveting read about something you haven't seen before.
tl;dr Highly literate, human-level translations, infused with personality and style, done by a team of AIs I put together, with a turnaround time of ~48 hours for a 100% complete translation. Try Oyako Rankan (second page) or just read over some of the logs to see what the best possible result from this system looks like. I recommend going through the thread though, it was an interesting experience and I documented quite a bit as I went along. All translation starting with Haha Ranman come in two flavors - a word-for-word that's still very readable, uses proper grammar and pronouns, but sometimes a bit clunky, and one that's gone through an AI editor pass that was allowed some creative leeway. Editor is not allowed to make changes to plot or scene elements but has leeway with phrasing and character voicing. It exceeded all my expectations.
---
I'm a big AliceSoft fan. I've been dying to play some of these games for the better part of a decade. Yeah, some of these have translations (Oyako Rankan and Tsumamigui 3) but these are GPT (and older GPT, at that) which doesn't produce good results. GPT is great for coding but it doesn't know how to write and pre-gpt4o it was not that great at Japanese, either. It still isn't, imo.
What I've built over the past few weeks is a complex LangChain framework for translating VNs specifically. I won't get into implementation details as I don't intend to share my code for reasons I don't intend to get into (it's not because I think it's worth much - I built this in its current form in a week), but the core of it is:
1.) Big fancy LLM (formerly Qwen-72B which I recommend to anyone looking to self-host a translation solution like this one - it'll get you 80% of the way there) on translation duty
2.) Little fancy LLM for summarization / plot tracking / categorization
3.) Big fancy LLM on editing
4.) Self-hosted Command-R+ for assorted RAG duties.
I call it the
A - Automated
U - Universal
T - Translation
I - Interface for
S - Semantic
M - Manipulation
Personally, I think it produces very readable, good quality prose, that accurately reflects the meaning and retains cultural details semi-decently. I was smart with how I chunk and stream the script data into the chain so it retains context awareness quite well. There are no grammar errors, pronouns are good, verbiage is evocative, lines are calculated to avoid overflow, dialogue has characterization. Some examples for Heartful Maman:
It's particularly good at characterization when running with all features turned on. For example, check out this stellar job it does of characterizing Tsuguo at introduction:
Personally, I'm blown away. I've played through about 20% of HM now and it's an extremely enjoyable read this way. I've added some improvements to the next run that should address passive/active voice issues and stuff like "called out to, Tsuguo seemed .... " - technically grammatically correct, but stilted and weird - for future runs.
Anyway, it took about 24 hours to cook Heartful Maman from start to finish (and about $70). I'll be firing up Tsumamigui 3 later today, I expect that one to take ~48 hours. I'll post the Heartful Maman .ain later today in this thread (and the others as they come out.)
Planned translation order:
1. Heartful Maman - done
2. Tsumamigui 3 - done
3. Haha Ranman - done
4. Oyako Rankan - done
5. Pastel Chime 3 - cancelled
6. Daiteikoku - cancelled
7. Editor pass on 1-6 - cancelled
8. Sakura no Uta - in progress
PS. To the "but Heartful Maman has a translation" folks - both links in the thread from the DeepL + 5% rusty Japanese guy contain untranslated ain. He either forgot to uncomment out the messages before rebuilding the ain or uploaded the wrong file. At any rate, what I've got is better than DeepL.
The translation isn't done on a line by line basis with this approach, but it's a useful measure of progress. With the current setup rate averages at 1.5 "messages" (a line of text from the VN) per second. Tsumamigui is 100k lines, and I'm at 5k lines done now. Haha Ranman has a good interface translate patch out so I'll definitely run that by Wednesday, too. Oyako Rankan's translation is readable, but I can improve on it. PC3 and Daiteikoku I'm actually not sure if they have interface patches, but I'll OCR those if I have to.
tl;dr Highly literate, human-level translations, infused with personality and style, done by a team of AIs I put together, with a turnaround time of ~48 hours for a 100% complete translation. Try Oyako Rankan (second page) or just read over some of the logs to see what the best possible result from this system looks like. I recommend going through the thread though, it was an interesting experience and I documented quite a bit as I went along. All translation starting with Haha Ranman come in two flavors - a word-for-word that's still very readable, uses proper grammar and pronouns, but sometimes a bit clunky, and one that's gone through an AI editor pass that was allowed some creative leeway. Editor is not allowed to make changes to plot or scene elements but has leeway with phrasing and character voicing. It exceeded all my expectations.
---
I'm a big AliceSoft fan. I've been dying to play some of these games for the better part of a decade. Yeah, some of these have translations (Oyako Rankan and Tsumamigui 3) but these are GPT (and older GPT, at that) which doesn't produce good results. GPT is great for coding but it doesn't know how to write and pre-gpt4o it was not that great at Japanese, either. It still isn't, imo.
What I've built over the past few weeks is a complex LangChain framework for translating VNs specifically. I won't get into implementation details as I don't intend to share my code for reasons I don't intend to get into (it's not because I think it's worth much - I built this in its current form in a week), but the core of it is:
1.) Big fancy LLM (formerly Qwen-72B which I recommend to anyone looking to self-host a translation solution like this one - it'll get you 80% of the way there) on translation duty
2.) Little fancy LLM for summarization / plot tracking / categorization
3.) Big fancy LLM on editing
4.) Self-hosted Command-R+ for assorted RAG duties.
I call it the
A - Automated
U - Universal
T - Translation
I - Interface for
S - Semantic
M - Manipulation
Personally, I think it produces very readable, good quality prose, that accurately reflects the meaning and retains cultural details semi-decently. I was smart with how I chunk and stream the script data into the chain so it retains context awareness quite well. There are no grammar errors, pronouns are good, verbiage is evocative, lines are calculated to avoid overflow, dialogue has characterization. Some examples for Heartful Maman:
You don't have permission to view the spoiler content.
Log in or register now.
You don't have permission to view the spoiler content.
Log in or register now.
It's particularly good at characterization when running with all features turned on. For example, check out this stellar job it does of characterizing Tsuguo at introduction:
You don't have permission to view the spoiler content.
Log in or register now.
Personally, I'm blown away. I've played through about 20% of HM now and it's an extremely enjoyable read this way. I've added some improvements to the next run that should address passive/active voice issues and stuff like "called out to, Tsuguo seemed .... " - technically grammatically correct, but stilted and weird - for future runs.
Anyway, it took about 24 hours to cook Heartful Maman from start to finish (and about $70). I'll be firing up Tsumamigui 3 later today, I expect that one to take ~48 hours. I'll post the Heartful Maman .ain later today in this thread (and the others as they come out.)
Planned translation order:
1. Heartful Maman - done
2. Tsumamigui 3 - done
3. Haha Ranman - done
4. Oyako Rankan - done
5. Pastel Chime 3 - cancelled
6. Daiteikoku - cancelled
7. Editor pass on 1-6 - cancelled
8. Sakura no Uta - in progress
PS. To the "but Heartful Maman has a translation" folks - both links in the thread from the DeepL + 5% rusty Japanese guy contain untranslated ain. He either forgot to uncomment out the messages before rebuilding the ain or uploaded the wrong file. At any rate, what I've got is better than DeepL.
The translation isn't done on a line by line basis with this approach, but it's a useful measure of progress. With the current setup rate averages at 1.5 "messages" (a line of text from the VN) per second. Tsumamigui is 100k lines, and I'm at 5k lines done now. Haha Ranman has a good interface translate patch out so I'll definitely run that by Wednesday, too. Oyako Rankan's translation is readable, but I can improve on it. PC3 and Daiteikoku I'm actually not sure if they have interface patches, but I'll OCR those if I have to.
Last edited: