Skip to content

[RFC]: npugraph_ex backend #4715

@ChenCangtao

Description

@ChenCangtao

Motivation.

In order to achieve better performance when using aclgraph, we introduced the npugraph_ex backend, which is a simple, easy-to-boundary, and accuracy-concern-free fullgraph aclgraph acceleration solution.

  • base on torch.compile
  • optimizing and enhancing fx.graph using torchair

Proposed Change.

  1. Add a configuration item in additional_config, defaulting to False, and add a validation so that this configuration is only effective when fullgraph or full_decode_only are enabled.
  2. Write a new adaptor class, decide whether FX graph optimization are necessary based on the switch.

Feedback Period.

No response

CC List.

No response

Any Other Things.

  1. The purpose of adding this backend is to achieve optimal performance, so it is recommended to use it in fullgraph mode.

Fature Plan

We plan and hope that in the future this backend can become the default optimization for aclgraph.
These optimization measures will also be included in our plan:

  1. Memory reuse optimization
  2. Kernel Performance Optimization
  3. Redundant kernel optimization processing
    These features are also in our plan, but they require users to modify their scripts:
  4. Support muti-stream
  5. Support for the Dynamo cache in torch.compile
  6. Supports weight prefetching
  7. Weight proprietary format optimization — pre-convert the cube operator's weights to NZ format

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For Comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions