perf: Disable thinking mode for local Qwen3/3.5 to improve inference speed

## Problem

The local Qwen3/3.5 models (for example `qwen3.5-4b-q4_k_m`) currently generates `<think>` reasoning tokens during inference. While the streaming code in `ReasoningService.ts` already strips these `<think>` blocks from the output, **the model still spends time generating them**, which significantly impacts inference speed on a small 4B model.

This is especially relevant for dictation/transcription cleanup — the primary use case — where thinking overhead adds latency without any benefit to the user.

## Current behavior

1. User runs inference with Qwen3.5 4B locally via llama.cpp
2. Model generates `<think>...</think>` tokens (reasoning phase)
3. `processTextStreaming()` strips `<think>` blocks from the streamed output
4. User sees clean output but **waits for thinking tokens to be generated first**

## Expected behavior

Thinking mode should be disabled at the inference level for Qwen3.5 4B (and potentially all local Qwen3.5 models) so the model skips reasoning entirely and responds faster.

## Suggested approach

Qwen3/3.5 models support disabling thinking via:

- **Chat template parameter**: Pass `enable_thinking: false` via llama.cpp's `--chat-template-kwargs '{"enable_thinking": false}'`
- **Prompt-level control**: Append `/no_think` to the user message or system prompt

Either approach would prevent the model from generating `<think>` tokens entirely, saving inference time rather than just stripping them after generation.

## Context

- Groq's Qwen3 32B already has `disableThinking: true` in `modelRegistryData.json` — a similar mechanism could be extended to local models
- Issue #492 addresses reasoning effort overrides for cloud models, but doesn't cover local model thinking mode
- The Qwen3.5 4B model is 2.7GB — every token of unnecessary reasoning is proportionally expensive on consumer hardware

## Affected models

- `qwen3.5-4b-q4_k_m` (primary — smallest, most speed-sensitive)
- `qwen3.5-2b-q4_k_m` (same family, same issue)
- `qwen3.5-9b-q4_k_m` (less critical but still applies)
- `qwen3-4b-q4_k_m`, `qwen3-8b-*`, `qwen3-1.7b-*` (Qwen3 family, same thinking behavior)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Disable thinking mode for local Qwen3/3.5 to improve inference speed #512

Problem

Current behavior

Expected behavior

Suggested approach

Context

Affected models

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

perf: Disable thinking mode for local Qwen3/3.5 to improve inference speed #512

Description

Problem

Current behavior

Expected behavior

Suggested approach

Context

Affected models

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions