RenderedFatality
Member
- Dec 1, 2018
- 110
- 96
Theres some fancier things you can do with that particular model to speed it up, ik_llama.cpp and a specific argument
--override-tensor "([0-2]).ffn_.*_exps.=CUDA0" --override-tensor "([3-9]|[1-9][0-9]+).ffn_.*_exps.=CPU" or something like that, + tuning --ngl argument. + --fmoe on ik_llama.cpp
I was taking a better portion of an hour earlier capturing responses and creating a tiny dataset to finetune the 4b and 14b earlier, and the 4b is fairly promising, accidently baked it for 8 epochs on 405 examples so is it overfitted? absolutely. but it definitely wasnt bad.
--override-tensor "([0-2]).ffn_.*_exps.=CUDA0" --override-tensor "([3-9]|[1-9][0-9]+).ffn_.*_exps.=CPU" or something like that, + tuning --ngl argument. + --fmoe on ik_llama.cpp
I was taking a better portion of an hour earlier capturing responses and creating a tiny dataset to finetune the 4b and 14b earlier, and the 4b is fairly promising, accidently baked it for 8 epochs on 405 examples so is it overfitted? absolutely. but it definitely wasnt bad.
Last edited: