Skip to content

fix: parse reasoning output from LM Studio reasoning models#5584

Open
angelplusultra wants to merge 3 commits into
masterfrom
5583-bug-lm-studio-provider-does-not-present-reasoning-output
Open

fix: parse reasoning output from LM Studio reasoning models#5584
angelplusultra wants to merge 3 commits into
masterfrom
5583-bug-lm-studio-provider-does-not-present-reasoning-output

Conversation

@angelplusultra
Copy link
Copy Markdown
Contributor

@angelplusultra angelplusultra commented May 6, 2026

Pull Request Type

  • ✨ feat (New feature)
  • 🐛 fix (Bug fix)
  • ♻️ refactor (Code refactoring without changing behavior)
  • 💄 style (UI style changes)
  • 🔨 chore (Build, CI, maintenance)
  • 📝 docs (Documentation updates)

Relevant Issues

resolves #5583

Description

The LM Studio provider was using the generic handleDefaultStreamResponseV2 stream handler, which only reads choices[0].delta.content. LM Studio's OpenAI-compatible endpoint emits reasoning tokens for reasoning models in a separate reasoning_content field (same shape as DeepSeek), so those tokens were silently dropped on both the streaming and non-streaming paths and the UI never showed the model's thinking output.

Streaming (handleStream) — replaced the default handler with a bespoke implementation modeled after the existing DeepSeek/Foundry handlers:

  • Reads both delta.content and delta.reasoning_content per chunk.
  • On the first reasoning chunk, prepends a <think> tag and streams subsequent reasoning tokens through verbatim.
  • On the first content token after a reasoning run, emits the closing </think> tag, flushes the buffered reasoning into fullText, and resumes normal content streaming.
  • Preserves the existing usage-metric handling, abort handling, and finish_reason termination logic from the default handler.

Non-streaming (getChatCompletion) — added a #parseReasoningFromResponse helper that, when message.reasoning_content is present, prepends <think>{reasoning}</think> to the response content before returning it as textResponse. Same shape the streaming path produces, so downstream rendering is consistent across both modes.

Visuals (if applicable)

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated (if applicable)
  • I have tested my code functionality
  • Docker build succeeds locally

@angelplusultra angelplusultra linked an issue May 6, 2026 that may be closed by this pull request
@angelplusultra angelplusultra marked this pull request as ready for review May 6, 2026 23:19
@angelplusultra angelplusultra requested a review from shatfield4 May 6, 2026 23:19
Copy link
Copy Markdown
Collaborator

@shatfield4 shatfield4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested using deepseek/deepseek-r1-0528-qwen3-8b and everything worked as it should. LGTM.

@RailKill
Copy link
Copy Markdown

RailKill commented May 11, 2026

should the reasoning_content implementations be moved to handleDefaultStreamResponseV2 so that it covers other providers? i think it should be fine since there's if-checks and guards. the GenericOpenAI provider is still using the old code and has TODO comments to use a shared function. that's the point of handleDefaultStreamResponseV2 right?

i'm experiencing this problem with the LocalAI provider, and i'm sure Ollama and other providers have the same problem because they all use OpenAI-compatible endpoints, requiring the same fix. i think this would be a good opportunity to do that

@angelplusultra
Copy link
Copy Markdown
Contributor Author

angelplusultra commented May 11, 2026

should the reasoning_content implementations be moved to handleDefaultStreamResponseV2 so that it covers other providers? i think it should be fine since there's if-checks and guards. the GenericOpenAI provider is still using the old code and has TODO comments to use a shared function. that's the point of handleDefaultStreamResponseV2 right?

i'm experiencing this problem with the LocalAI provider, and i'm sure Ollama and other providers have the same problem because they all use OpenAI-compatible endpoints, requiring the same fix. i think this would be a good opportunity to do that

Yes, there will be some sort of unified refactor for all these identical handleStream methods across providers. It most likely will result in a refactor of handleDefaultStreamResponseV2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: LM Studio Provider Does Not Present Reasoning Output

4 participants