Artificial Intelligence / discovery / 4 MIN READ

LLMs Know When to Use Tools But Fail to Act on It

LLMs don't fail at tool use because they can't recognize when they need help — they fail because they don't act on that recognition. A new study puts the mismatch rate at up to 54%, and traces the breakdown to a single transition: cognition to action.

Reality 75 /100
Hype 25 /100
Impact 65 /100
Share

Explanation

When an AI agent decides whether to answer a question itself or call an external tool (like a calculator or search engine), you'd assume the main challenge is knowing which situation you're in. Turns out, that's not the bottleneck.

Researchers tested four large language models on arithmetic and factual question-answering tasks, measuring how often models should use a tool (based on whether they actually get the answer right without one) versus how often they do. The mismatch is striking: 26.5–54% on math tasks, 30.8–41.8% on factual QA. Nearly half the time, the model's behavior doesn't match what its own capability profile demands.

The key insight comes from probing the models' internal states. The researchers split tool use into two stages: cognition (does the model internally "believe" a tool is needed?) and execution (does it actually call one?). Both signals are detectable in the model's hidden layers — but in the late layers that directly drive the next token output, the two signals point in nearly opposite directions. The model knows, but doesn't do.

Most of the mismatch lives in that cognition-to-action gap, not in faulty self-assessment. The model's internal read of the situation is often correct; something breaks in the translation to behavior.

Why does this matter today? Because the entire agentic AI stack — from coding assistants to autonomous research tools — assumes that if you give a model access to tools and good judgment, it will use them appropriately. This research suggests the failure mode isn't judgment; it's a structural disconnect in how internal states become outputs. Fixing it likely requires targeted interventions at the late-layer, action-generation stage, not just better training data or prompting.

Reality meter

Artificial Intelligence Time horizon · mid term
Reality Score 75 / 100
Hype Risk 25 / 100
Impact 65 / 100
Source Quality 75 / 100
Community Confidence 50 / 100

Why this score?

Trust Layer LLMs internally recognize when external tools are needed but systematically fail to translate that recognition into tool-call actions, with mismatch rates of up to 54% — a structural 'knowing-doing gap' concentrated at the cognition-to-action transition.
Main claim

LLMs internally recognize when external tools are needed but systematically fail to translate that recognition into tool-call actions, with mismatch rates of up to 54% — a structural 'knowing-doing gap' concentrated at the cognition-to-action transition.

Evidence
  • Behavioral mismatch between model-adaptive tool necessity and observed tool-call behavior ranges from 26.5–54.0% on arithmetic tasks and 30.8–41.8% on factual QA across four tested models.
  • Both cognition (internal belief about necessity) and execution (actual tool-call behavior) signals are linearly decodable from LLM hidden states, confirming they are encoded in the model's representations.
  • In the late-layer, last-token regime that drives next-token generation, the probe directions for cognition and execution become nearly orthogonal — mechanistically explaining the decoupling.
  • Trajectory analysis shows the majority of mismatch is concentrated in the cognition-to-action transition, not in the cognition stage itself.
  • Tool necessity is defined model-adaptively based on each model's empirical solve rate without tools, distinguishing this work from prior model-agnostic annotation approaches.
Skepticism
  • The study covers only arithmetic and factual QA datasets; generalization to more open-ended or multi-step agentic tasks is undemonstrated.
  • Only four models are tested; the range of mismatch rates (26.5–54%) varies substantially, and the paper does not fully explain what drives the variance across models.
  • Linear probe decodability confirms the signals exist but does not establish that they are causally relevant to behavior — correlation between probe direction and action gap needs stronger causal validation.
Score rationale
Reality 75

The core quantitative claims (mismatch rates, probe orthogonality) are grounded in empirical measurements across multiple models and datasets, with a clear mechanistic decomposition — not just a behavioral observation.

Hype 25

The paper makes no overclaims; it explicitly scopes findings to the tested tasks and frames the knowing-doing gap as a diagnosis requiring further intervention work, not a solved problem.

Impact 65

The finding directly challenges the assumption underlying agentic AI system design — that better judgment is the fix — and points to a specific, actionable failure locus (late-layer action generation), making it practically relevant to anyone building tool-augmented LLM pipelines.

Source receipts
  • 1 source on file
  • Avg trust 90/100
  • Trust 90/100

Time horizon

Expected mid term

Community read

Community live aggregateIdle
Reality (article)75/ 100
Hype25/ 100
Impact65/ 100
Confidence50/ 100
Prediction Yes0%1 votes
Prediction votes1

Glossary

linear probes
Machine learning classifiers trained on hidden neural network states to detect and measure whether specific information (like a model's internal beliefs) is encoded in those states. They work by finding linear directions in the network's internal representations that correlate with the target signal.
residual stream
The main information pathway running through a transformer neural network, where data flows and accumulates across layers. It's the central channel through which information is processed and transformed as it moves through the model.
RLHF (Reinforcement Learning from Human Feedback)
A training technique that fine-tunes language models using human preferences as reward signals, steering the model toward outputs humans find more helpful, harmless, and honest. It's commonly used to align model behavior with desired outcomes.
representation engineering
A technique for modifying how information is encoded within a neural network's internal states to change the model's behavior, without retraining the entire model. It involves directly manipulating the learned representations to steer outputs in desired directions.
steering vectors
Computed directions in a neural network's representation space that, when applied to the model's internal states, reliably shift its behavior toward specific outcomes. They act as a control mechanism for guiding model outputs without full retraining.
orthogonal
In the context of neural networks, two signals or directions are orthogonal when they are mathematically independent and carry no shared information. When probe directions become orthogonal, it means the model's internal beliefs and its action-generation pathways are decoupled.
Your signal

What's your read?

Your read shapes future topic weighting.

Quick vote
More rating options
Stars (1–5)
How real is this? Reality Ø 75
More or less of this?

Your vote feeds topic weights, community direction and future prioritisation. Open community direction

Sources

Optional Submit a prediction Optional: add your prediction on the core question if you like.

Prediction

Will a targeted late-layer intervention (e.g., representation steering or stage-specific fine-tuning) reduce the cognition-to-action mismatch in LLM tool use below 15% within 18 months of this paper's publication?

Partly100 %
Yes0 %
Unclear0 %
No0 %
1 votesAvg confidence 70

Related transmissions