okay, custom module training recommendations:
- biggest rule of thumb: put in your data formatted the way you want the ai to generate it
- plain text in txt format, no formatting tags/markdown/html
- prose
- one line = one paragraph, no paragraphs split onto multiple lines
- no empty lines between paragraphs
- empty line could be used for chapter break, but recommend using *** instead
- no leading/trailing space, tab or other whitespace (easy to clean with notepad++)
- only one space character (no double/triple/... spaces)
- ideally, use regular quote and single quote characters " and ' not fancy ones (easier on the ai)
- make sure all included material is focused on what your module should achieve, no kitchen sink (e.g. tried to do steampunk, but most steampunk novels don't talk about steampunk stuff all the time, so ended up really weak)
- if you want to avoid leaking character names/other stuff, keep the data balanced and include many different stories with differen characters
- 1mb to 5mb seems like a good amount of text for an author style or theme style that can have low character leakage if varied enough (used 3k steps, but our pipeline may use different context size and batch size, which will change how many steps are needed)
- upper limit probably 10mb
- feel free to experiment with short data. nothing stops you from turning a short prompt into a module, will require much less steps too (maybe 50-100?)
- don't expect a module to memorize relational/factual data (e.g. if you feed it a story with pokemon descriptions, it'll probably bring up those pokemons, but it may still get their types wrong)