Skip to content

Conversation

@michalowski-arm
Copy link
Contributor

This change is an initial implementation of the external brgemm API for aarch64, mostly directly based on the existing x64 code. For now, only supports f32 is supported, with support for int8 and bf16 (and further improvements) to come in later patches building on this baseline implementation.

To test this change, enable DNNL_EXPERIMENTAL_UKERNEL before compiling, then run benchdnn with --brgemm as usual. As an alternative to benchdnn, you can use examples/ukernels/cpu_brgemm.cpp, though it has to be changed slightly to use f32 instead of int8 and not set AB scales.

@michalowski-arm michalowski-arm requested review from a team as code owners December 2, 2025 11:43
@github-actions github-actions bot added platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 component:tests Codeowner: @oneapi-src/onednn-arch labels Dec 2, 2025
@michalowski-arm michalowski-arm added the component:api Codeowner: @oneapi-src/onednn-arch label Dec 2, 2025
@github-actions github-actions bot removed the component:api Codeowner: @oneapi-src/onednn-arch label Dec 2, 2025
Copy link
Contributor

@jondea jondea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good, two minor questions.

return status::success;
}

status_t init_conf(brgemm_matmul_conf_t &conf, dim_t batch, dim_t M, dim_t K,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does adding the external brgemm interface mean we add a new function for the internal brgemm_matmul?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That init_conf is used by the transform, which is used for packing of B. We could move this logic elsewhere, I was just focusing on following the x64 approach as closely as possible for this initial version.

@jondea
Copy link
Contributor

jondea commented Dec 3, 2025

Also, can we make sure this is tested in CI somehow?

@michalowski-arm
Copy link
Contributor Author

Also, can we make sure this is tested in CI somehow?

As it is right now, building with DNNL_EXPERIMENTAL_UKERNEL is going to use the external brgemm api for ./benchdnn --brgemm, so it would probably need to be a separate, new pipeline (if we want to keep testing the internal brgemm api). Alternatively, we could maybe not replace internal brgemm benchdnn with the ukernel version on aarch64, and just write a suite of tests similar to examples/ukernels/cpu_brgemm.cpp for the CI. Either way, it's not gonna be as easy as just enabling the ukernel in the CI.

@jondea
Copy link
Contributor

jondea commented Dec 3, 2025

I'd lean towards just enabling DNNL_EXPERIMENTAL_UKERNEL where we can. My expectation is that when you call benchdnn --something you are directly testing external things. Testing the internal brgemm interface feels like a fallback rather than an intentional thing. The internal brgemm will continue to be tested via the external API, and more directly where it is actually used e.g. brgemm_matmal and brgconv etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:tests Codeowner: @oneapi-src/onednn-arch platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants