-
Notifications
You must be signed in to change notification settings - Fork 629
Open
Labels
RFCRequest For CommentsRequest For Comments
Description
Motivation.
In order to achieve better performance when using aclgraph, we introduced the npugraph_ex backend, which is a simple, easy-to-boundary, and accuracy-concern-free fullgraph aclgraph acceleration solution.
- base on torch.compile
- optimizing and enhancing fx.graph using torchair
Proposed Change.
- Add a configuration item in additional_config, defaulting to False, and add a validation so that this configuration is only effective when fullgraph or full_decode_only are enabled.
- Write a new adaptor class, decide whether FX graph optimization are necessary based on the switch.
Feedback Period.
No response
CC List.
No response
Any Other Things.
- The purpose of adding this backend is to achieve optimal performance, so it is recommended to use it in fullgraph mode.
Fature Plan
We plan and hope that in the future this backend can become the default optimization for aclgraph.
These optimization measures will also be included in our plan:
- Memory reuse optimization
- Kernel Performance Optimization
- Redundant kernel optimization processing
These features are also in our plan, but they require users to modify their scripts: - Support muti-stream
- Support for the Dynamo cache in torch.compile
- Supports weight prefetching
- Weight proprietary format optimization — pre-convert the cube operator's weights to NZ format
Metadata
Metadata
Assignees
Labels
RFCRequest For CommentsRequest For Comments