# Extended Thinking
{: .d-inline-block }

New in 1.10
{: .label .label-green }

Give reasoning models more time and budget to deliberate, with optional access to thinking output
{: .fs-6 .fw-300 }

---

After reading this guide, you will know:

* How to control extended thinking with `with_thinking`
* How effort and budget are sent to providers
* How to access thinking output in responses and streams
* How to persist thinking data with ActiveRecord

## What is Extended Thinking?

Extended Thinking gives supported models more time and a larger computation budget to deliberate before answering. It can improve results on multi-step tasks like coding, math, and logic, at the expense of latency and cost. Some providers can also return a thinking trace or signature alongside the final answer.

## Controlling Extended Thinking

Use `with_thinking` to control models that support thinking. Some models think by default, so `with_thinking` is for tuning (or disabling) rather than turning it on.

```ruby
chat = RubyLLM.chat(model: 'claude-opus-4.5')
  .with_thinking(effort: :high, budget: 8000)

response = chat.ask("What is 15 * 23?")

response.thinking&.text
response.thinking&.signature
response.content
```

`with_thinking` requires at least one of `effort` or `budget`:

```ruby
chat.with_thinking(effort: :low)
chat.with_thinking(budget: 10_000)
chat.with_thinking(effort: :none)
```

### Effort and Budget

Use `effort` to pick a qualitative depth (`:low`, `:medium`, `:high`) and `budget` for models that accept a token cap.

RubyLLM sends `effort` and `budget` exactly as provided. Check your provider's docs for supported values.

## Streaming with Thinking

Thinking content is delivered alongside normal content in streaming chunks:

```ruby
chat = RubyLLM.chat(model: 'claude-opus-4.5')
  .with_thinking(effort: :medium)

chat.ask("Solve this step by step: What is 127 * 43?") do |chunk|
  print chunk.thinking&.text
  print chunk.content
end
```

Some providers only expose thinking in the final response. In those cases, `response.thinking` is populated after the stream completes, and `chunk.thinking` stays empty.

## ActiveRecord Integration

When using `acts_as_chat` and `acts_as_message`, thinking output is persisted to the message table:

```ruby
# Migration (generated automatically with new installs)
# t.text :thinking_text
# t.text :thinking_signature
# t.integer :thinking_tokens

response = chat_record.ask("Explain quantum entanglement")
response.thinking&.text
response.thinking_tokens
```

`thinking_tokens` is usually a breakdown of generated output work. From v1.15 onward, RubyLLM normalizes `output_tokens` as the billable output bucket, so you should not add `thinking_tokens` to `output_tokens` for cost calculations. When a model has distinct reasoning-token pricing, the cost is exposed separately as `response.cost.thinking`.

### Upgrading Existing Installations

For 1.10 upgrades, consider using the [upgrade guide](/ruby-llm-docs/upgrading/#upgrade-to-1-10) to run the generator.
If you prefer manual migrations, add the columns to your message and tool calls tables:

```ruby
class AddThinkingToMessages < ActiveRecord::Migration[7.1]
  def change
    add_column :messages, :thinking_text, :text
    add_column :messages, :thinking_signature, :text
    add_column :messages, :thinking_tokens, :integer
    add_column :tool_calls, :thought_signature, :string
  end
end
```

## Provider Notes

- Claude uses a thinking budget and can return both text and signature.
- Anthropic requires a thinking budget.
- Bedrock thinking params are model-dependent; models may accept budget, effort, or provider-specific fields.
- Gemini 2.5 uses a token budget; Gemini 3 uses effort levels.
- OpenAI reasoning models accept `effort` but may not return thinking text or signatures.
- Perplexity sonar reasoning models stream `<think>` blocks inside content; RubyLLM extracts them after the response completes.
- Mistral Magistral models always think and ignore `with_thinking` params. Non-magistral models warn if you pass them.
- Ollama's Qwen3 models think by default and only accept `effort: :none` to disable thinking.
- Anthropic and Ollama integrations currently do not report thinking token counts.

## Next Steps

* [Streaming Responses](/ruby-llm-docs/streaming/)
* [Rails Integration](/ruby-llm-docs/rails/)
* [Error Handling](/ruby-llm-docs/error-handling/)
