Tree light xgboost slo aware #1961

Gregory-Pereira · 2025-12-06T15:20:40Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This gives an option for implementing CPU inferencing for latency predictor in the treelite library for efficiency. Treelite is purpose-built for the efficient deployment of tree-based models, whereas the XGBoost Python library is designed for training and research flexibility. Once a model is trained, Treelite compiles it into optimized, model-specific C/C++ code, eliminating the interpreter overhead, dynamic branching, and general-purpose runtime structures used by XGBoost’s predictor. This ahead-of-time (AOT) compilation allows Treelite to:

Vectorize and fuse operations for better CPU/GPU utilization.
Minimize branching overhead via specialized code paths tailored to the exact tree structure.
Reduce memory access latency through contiguous, cache-friendly data layouts.
Avoid Python and framework overhead, resulting in lower latency and higher throughput during inference.

In short, XGBoost is optimized for training and experimentation, while Treelite is optimized for lightweight, production-grade inference — making it more performant when serving models at scale.

Which issue(s) this PR fixes:
Fixes #1937

Does this PR introduce a user-facing change?:

This adds deployment options for latency prediction using Treelite

Signed-off-by: greg pereira <grpereir@redhat.com>

k8s-ci-robot · 2025-12-06T15:20:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Gregory-Pereira
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-12-06T15:20:50Z

Hi @Gregory-Pereira. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-12-06T15:21:40Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`cfb9e23`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/693449cb051ba100083bebc6
😎 Deploy Preview	https://deploy-preview-1961--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Gregory-Pereira added 3 commits December 4, 2025 13:02

treelight implementation

d0fe6ca

Signed-off-by: greg pereira <grpereir@redhat.com>

treelight deployment manifests

33c95d7

Signed-off-by: greg pereira <grpereir@redhat.com>

test and manifest updates

cfb9e23

Signed-off-by: greg pereira <grpereir@redhat.com>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Dec 6, 2025

k8s-ci-robot requested review from danehans and nirrozenbaum December 6, 2025 15:20

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 6, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 6, 2025

Gregory-Pereira mentioned this pull request Dec 6, 2025

Predicted Latency based routing - xgboost sidecar scaling #1937

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tree light xgboost slo aware #1961

Tree light xgboost slo aware #1961

Gregory-Pereira commented Dec 6, 2025

Uh oh!

k8s-ci-robot commented Dec 6, 2025

Uh oh!

k8s-ci-robot commented Dec 6, 2025

Uh oh!

netlify bot commented Dec 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tree light xgboost slo aware #1961

Are you sure you want to change the base?

Tree light xgboost slo aware #1961

Conversation

Gregory-Pereira commented Dec 6, 2025

Uh oh!

k8s-ci-robot commented Dec 6, 2025

Uh oh!

k8s-ci-robot commented Dec 6, 2025

Uh oh!

netlify bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Dec 6, 2025 •

edited

Loading