could we have the option to limit the number of tokens generated for thinking via parameters equivalent to
--reasoning-budget and --reasoning-budget-message in llama.cpp CLI in the chathandler or elsewhere?
Very useful to control the thinking effort of the models , feel gemma 4 and qwen 3.6 tend to overthink in thinking mode
could we have the option to limit the number of tokens generated for thinking via parameters equivalent to
--reasoning-budget and --reasoning-budget-message in llama.cpp CLI in the chathandler or elsewhere?
Very useful to control the thinking effort of the models , feel gemma 4 and qwen 3.6 tend to overthink in thinking mode