[ROCm][CI] Fix test_max_len.py for Rocm #29916

charlifu · 2025-12-02T19:49:29Z

Use ROCM_AITER_FA backend for flash attn on rocm.

Signed-off-by: charlifu <charlifu@amd.com>

gemini-code-assist

Code Review

This pull request updates the attention backend for ROCm in the test utilities, changing it from FLASH_ATTN to ROCM_AITER_FA. This aligns with using the Aiter Flash Attention backend on ROCm platforms. My review includes a suggestion to update a related log message for consistency and to prevent confusion during debugging.

tests/utils.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>

Signed-off-by: charlifu <charlifu@amd.com>

divakar-amd · 2025-12-02T20:45:27Z

Hi @charlifu , can you also update it here:

vllm/tests/v1/e2e/test_spec_decode.py

Lines 400 to 418 in 1c593e1

    
           with monkeypatch.context() as m: 
        
               if "Llama-4-Scout" in model_setup[1] and attn_backend == "FLASH_ATTN": 
        
                   # Scout requires default backend selection 
        
                   # because vision encoder has head_dim 88 being incompatible 
        
                   #  with FLASH_ATTN and needs to fall back to Flex Attn 
        
                   pass 
        
               else: 
        
                   m.setenv("VLLM_MLA_DISABLE", "1") 
        
                   m.setenv("VLLM_ATTENTION_BACKEND", attn_backend) 
        
               if attn_backend == "TRITON_ATTN" and not current_platform.is_rocm(): 
        
                   pytest.skip( 
        
                       "TRITON_ATTN does not support " 
        
                       "multi-token eagle spec decode on current platform" 
        
                   ) 
        
               if attn_backend == "FLASH_ATTN" and current_platform.is_rocm(): 
        
                   if "deepseek" in model_setup[1].lower(): 
        
                       pytest.skip("FLASH_ATTN for deepseek not supported on ROCm platform")

mergify · 2025-12-03T17:09:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlifu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: charlifu <charlifu@amd.com>

tests/utils.py

tjtanaa · 2025-12-04T05:23:58Z

@charlifu after you done fixing it, can you share the tests status of the files that you have fixed?

tjtanaa · 2025-12-04T06:19:21Z

on mi300x, I am getting OOM issues on two of the tests. May I know which machine are you validating on? mi325?

robertgshaw2-redhat

removing approval since @tjtanaa is on point

Signed-off-by: charlifu <charlifu@amd.com>

charlifu · 2025-12-04T17:07:52Z

@charlifu after you done fixing it, can you share the tests status of the files that you have fixed?

pytest -sv tests/v1/spec_decode/test_max_len.py

9 passed, 4 warnings in 202.83s (0:03:22)

pytest -sv tests/v1/spec_decode/test_eagle.py

51 passed, 4 warnings in 37.66s

tjtanaa · 2025-12-05T01:23:36Z

@charlifu how about the status of tests/v1/e2e/test_spec_decode.py ?

charlifu · 2025-12-05T17:47:42Z

@charlifu how about the status of tests/v1/e2e/test_spec_decode.py ?

Still failing, but that's not what this PR aiming for. Can we get this merged first?

micah-wil · 2025-12-05T17:55:33Z

@charlifu how about the status of tests/v1/e2e/test_spec_decode.py ?

Fails for unrelated reason that is addressed in #29926

Signed-off-by: charlifu <charlifu@amd.com>

charlifu · 2025-12-06T09:01:05Z

Another fix here:

pytest -sv tests/basic_correctness/test_basic_correctness.py

=============== 34 passed, 6 warnings in 480.39s (0:08:00) ==========

tjtanaa

LGTM. Let's not any more tests. Open another PR if there are new tests.

tjtanaa · 2025-12-07T03:36:23Z

@micah-wil I saw you have merged the bugfix PR into this PR by syncing from main. Are the test cases in tests/v1/e2e/test_spec_decode.py all green? I think this test is yet to be enabled in AMD CI. Can you share the status of tests/v1/e2e/test_spec_decode.py.

use ROCM_AITER_FA for the flash attn on rocm

0074958

Signed-off-by: charlifu <charlifu@amd.com>

mergify bot added the rocm Related to AMD ROCm label Dec 2, 2025

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

tests/utils.py Outdated Show resolved Hide resolved

AndreasKaratzas mentioned this pull request Dec 2, 2025

[AMD][CI Failure]: Tracking V1 Test others failing tests #29116

Open

charlifu and others added 2 commits December 2, 2025 13:51

Update tests/utils.py

e5f1f4a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>

use ROCM_AITER_FA on test

d6f3c79

Signed-off-by: charlifu <charlifu@amd.com>

mergify bot added speculative-decoding v1 labels Dec 2, 2025

DarkLight1337 requested a review from tjtanaa December 3, 2025 02:46

mergify bot added the needs-rebase label Dec 3, 2025

Merge branch 'main' into amd_ci/fix_test_max_len

d38904b

charlifu force-pushed the amd_ci/fix_test_max_len branch from 87c1f4d to d38904b Compare December 3, 2025 17:24

update test_spec_decode

4776f66

Signed-off-by: charlifu <charlifu@amd.com>

mergify bot removed the needs-rebase label Dec 3, 2025

update test_spec_decode

e4d4d9e

Signed-off-by: charlifu <charlifu@amd.com>

micah-wil reviewed Dec 4, 2025

View reviewed changes

tests/utils.py Show resolved Hide resolved

robertgshaw2-redhat approved these changes Dec 4, 2025

View reviewed changes

robertgshaw2-redhat requested changes Dec 4, 2025

View reviewed changes

fix test_eagle.py

0f2b4b9

Signed-off-by: charlifu <charlifu@amd.com>

Merge branch 'main' into amd_ci/fix_test_max_len

cb573c6

fix test_basic_correctness

a0b984c

Signed-off-by: charlifu <charlifu@amd.com>

tjtanaa approved these changes Dec 6, 2025

View reviewed changes

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2025

tjtanaa enabled auto-merge (squash) December 6, 2025 14:13

Merge branch 'main' into amd_ci/fix_test_max_len

e420c26

tjtanaa removed the ready ONLY add when PR is ready to merge/full CI is needed label Dec 7, 2025

Uh oh!

[ROCm][CI] Fix test_max_len.py for Rocm #29916

Are you sure you want to change the base?

[ROCm][CI] Fix test_max_len.py for Rocm #29916

Conversation

charlifu commented Dec 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

divakar-amd commented Dec 2, 2025

Uh oh!

mergify bot commented Dec 3, 2025

Uh oh!

Uh oh!

tjtanaa commented Dec 4, 2025

Uh oh!

tjtanaa commented Dec 4, 2025

Uh oh!

robertgshaw2-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

charlifu commented Dec 4, 2025

Uh oh!

tjtanaa commented Dec 5, 2025

Uh oh!

charlifu commented Dec 5, 2025

Uh oh!

micah-wil commented Dec 5, 2025

Uh oh!

charlifu commented Dec 6, 2025

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

charlifu commented Dec 2, 2025 •

edited by github-actions bot

Loading