feat(metrics): Add prefill KV compute metric excluding cached tokens #30189

ziliangpeng · 2025-12-06T19:36:57Z

Summary

This PR adds a new metric vllm:request_prefill_kv_computed_tokens that tracks the number of KV tokens computed during prefill phase, excluding cached tokens.

Motivation

Currently, vLLM tracks total prompt tokens (vllm:request_prompt_tokens) but doesn't have per-request visibility into how many KV tokens were actually computed vs served from cache (local prefix cache or remote KV cache like LMCache). This metric helps:

Understand cache effectiveness on a per-request basis
Better estimate actual compute costs vs total prompt size
Debug and optimize caching strategies
Monitor workload characteristics more accurately

Changes

Added num_cached_tokens field to FinishedRequestStats dataclass
Updated update_from_finished_request() to accept num_cached_tokens parameter
Added new histogram metric vllm:request_prefill_kv_computed_tokens in metrics loggers
Metric calculation: num_prompt_tokens - max(num_cached_tokens, 0)
Added comprehensive unit tests

Testing

Added unit tests in tests/v1/metrics/test_stats.py:
- Test with prefix cache hits
- Test without cache
- Test edge cases (negative values, all tokens cached)
Verified in production workloads showing expected cache effectiveness

The metric correctly includes cache hits from both local prefix cache and remote KV stores (KV connector, LMCache).

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

gemini-code-assist

Code Review

This pull request introduces a new metric, vllm:request_prefill_kv_computed_tokens, to track the number of KV tokens computed during the prefill phase, excluding any tokens served from the cache. The changes are well-implemented, adding the num_cached_tokens field to FinishedRequestStats and plumbing it through from the output processor. A new histogram is added to the Prometheus logger to record this metric, correctly calculating it as the difference between prompt tokens and cached tokens. The inclusion of comprehensive unit tests covering various scenarios, including edge cases, ensures the reliability of this new feature. The code is clear, follows existing patterns, and improves the observability of cache effectiveness. Overall, this is a solid contribution.

mergify · 2025-12-06T19:44:35Z

Hi @ziliangpeng, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Add new Prometheus metric `vllm:request_prefill_kv_computed_tokens` to track the number of new KV cache tokens computed during the prefill phase, excluding tokens served from prefix cache. This metric helps measure actual compute workload during prefill, accounting for prefix cache hits. It correctly handles: - Prefix caching (excludes cached tokens) - Chunked prefill (counts total prompt tokens, not per-chunk) - Edge cases (negative values, no cache) Changes: - Add `num_cached_tokens` field to `FinishedRequestStats` - Pass `num_cached_tokens` from `RequestState` through stats pipeline - Calculate prefill KV compute as `num_prompt_tokens - num_cached_tokens` - Add Prometheus histogram metric with standard buckets - Add comprehensive unit tests covering cache hits, no cache, and edge cases Example: Request with 10,000 token prompt Prefix cache hit: 1,200 tokens Metric reports: 8,800 tokens (10,000 - 1,200) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Ziliang Peng <ziliang@character.ai>

ziliangpeng requested a review from markmc as a code owner December 6, 2025 19:36

mergify bot added the v1 label Dec 6, 2025

gemini-code-assist bot reviewed Dec 6, 2025

View reviewed changes

ziliangpeng force-pushed the feat-prefill-kv-metric branch from 17b00c9 to 9a5fc4d Compare December 6, 2025 19:40

ziliangpeng force-pushed the feat-prefill-kv-metric branch from 9a5fc4d to 34d07c5 Compare December 6, 2025 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(metrics): Add prefill KV compute metric excluding cached tokens #30189

feat(metrics): Add prefill KV compute metric excluding cached tokens #30189

ziliangpeng commented Dec 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat(metrics): Add prefill KV compute metric excluding cached tokens #30189

Are you sure you want to change the base?

feat(metrics): Add prefill KV compute metric excluding cached tokens #30189

Conversation

ziliangpeng commented Dec 6, 2025

Summary

Motivation

Changes

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant