The NSFW content is quite easy. I am doing it through the API, so I just pass it the following as part of the generate_content() function in Python. This is in the documentation as well.
Code:
safety_settings={HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE}
For the token limits, Google offers a 300$ credit when you first sign up that expires after 3 months. That being said, it's so cheap I won't be mad when that expires. I completely redid Goblin Burrow: Fable, I think the largest game I've ever tackled in terms of text, for 1.80$. The quality improvement over DeepL, and the time savings when doing edits, are worth it. EDIT: Compared to estimates from ChatGPT using Dazed's tool, this is manageable for me without crowdsourcing money to run it. The output is probably worse, but hey, get what you pay for.
If that is still too much, Google gates Gemini on the free plan by requests per day far more strictly than by token use per day or per request through the API. So I ran some experiments with creating batches of lines to translate. If I was really aggressive with the aggregation, I could do 50,000-75,000 (Google says you have 1,500 requests per day, but it cut me off around 1,100) lines per day. I found going above 50 lines per request resulted in a lot of garbage, so that's what I set it at. Monitoring my usage, I capped out on daily requests while using about 4% of the daily free tokens.
You do lose anonymity. Technically all my translation work is now visible to Google and tied to my real name and credit card, which is not ideal, but probably not going to be an issue.
The biggest issue I've identified is that Gemini is... a fucking idiot basically. I was extremely skeptical of LLMs before using one, and find the hype to be entirely unearned after using it. It follows instructions at best some of the time. Even with explicit instructions about, for example, how to translate specific names, it will ignore them about 50-75% of the time. There was a lot of prompt engineering as I was trying to wrestle with this very powerful, stupid, baby brained computer, and I will probably continue iterating my prompts for a while. It does produce better English than DeepL, and is a little more flexible than Google Translate, but it's still stupid and is especially stupid at scale when a human can't babysit its every output.