I looked further in long context and performance degradation, and damn it's worse than I thought
This is why i also mentioned the use of RAG ( Retrieval-Augmented Generation ) some posts up..
If done properly you can use less of the context window.
Lets say you have 5 chapters.
You defiantly don't want to feed it all chapters in one go and say write me the next chapter.
You split each into its own text file.
So have something like NovelName_Chap1.txt, NovelName_Chap2.txt ...,
Or NovelName_Chap1.1.txt, NovelName_Chap1.2, NovelName_Chap1.3.., NovelName_Chap2.1.txt, NovelName_Chap2.2 ...,
And a summery file for each chapter. ie. NovelName_Summery_Chap1.txt, NovelName_Summery_Chap2.txt ...
And when you want to continue on with NovelName_Chap6.txt
You can tell it to use the NovelName_Summery_Chap5 to continue on with a story.
or refer to any earlier chapters if needed in the story.
Rag is much more efficient as it uses less context for better results.
Its not magic tho but If used properly:
* The retriever selects only the most relevant chunks.
* The model gets just what it needs.
* You don’t overwhelm the model with irrelevant or old info.
* You use less of the context window and leave more room for better, focused answers.
Technically, Rag and manual prompting both use the same mechanism
But Rag is more efficient in how it selects what goes into that window, so it can use less context overall, especially compared to naive, manual pasting.
Setting up Rag requires some work tho, you can do it in openAI webui if thats your interface or alike.
Or if you want to dive into the world of python you need to install Chroma, LangChain etc for this.
It usually involves splitting the docs into chunks, make vectors, store in a db etc.
Your Docs
↓
Split into chunks
↓
Embed (turn into vectors)
↓
Store in vector DB (e.g., Chroma)
↓
You ask a question regarding NovelName:
→ Relevant chunks are retrieved
→ Passed to Ollama as context
→ You get a better, smart answer based on content from the Rag.
Some Python script and a good prompt template + Rag can probably get you a long way.
I have good experience using Rag when it comes to api documentation .
For example i use Rag when i want to ask my qwen2.5-coder for c4d related python stuff.
Because there are so many api changes between c4d versions, and without the rag, it just spits out a mixture of old and new code.
With the rag i get a much much better result.
But Making novels using Rag.. i never tried it for that tho.