Skip to content

Conversation

@rsareddy0329
Copy link
Contributor

Issue #, if available:

Description of changes:

  • Fine-tuning SDK: SFT, RLVR, and RLAIF techniques with standardized parameter design
  • AIRegistry Integration: Added CRUD operations for datasets and evaluators
  • Enhanced Training Experience: Implemented MLFlow metrics tracking and deployment workflows

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mohamedzeidan2021 and others added 4 commits December 2, 2025 17:45
* prevent cmd injection in RuntimeEnvironmentManager

* fixed unit tests

---------

Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>
* sagemaker-core update, port sft-trainer and ai registry (#1895)

* sagemaker core update with MC service.json, SFT notebook flow succeeded before modelpackage

* remove workaround, modelpackage issue persists

* remove ai registry port

* feat: AIRegistry SDK Implementation (#1893)

* remove sft trainer port, fix ModelPackage issue (#1901)

* Add/port finetuning interfaces to v3 (#1906)

* Add/port finetuning interfaces to v3

* Add/port finetuning interfaces to v3

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

* ported evaluation from v2->v3. pending example notebooks results (#1902)

* v2->v3 porting for eval

* bring back deleted needed files

* unit tests

---------

Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>

* Mc deployment (#1909)

* Updates for Model Customization

* Model Builder Base change

* Fix

* Support for Model Customization

* New Model Builder changes with Model Customization

* Fix

* Add base_trainer, addition to PR#1906 (#1910)

* Add/port finetuning interfaces to v3

* Add/port finetuning interfaces to v3

* Add base_trainer, addition to PR#1906

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

* AIRegistry Integration Tests (#1908)

* feat: Trainer wait with MLFlow metrics (#1907)

* feat: update eval integs (#1912)

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>

* Update get_all method for Evaluator and Dataset (#1913)

* Update get_all method for Evaluator and Dataset

- Update Dataset.get_all() and added refresh()
- Update Evaluator.get_all() and added refresh()
- Added more tags in condition when importing hub content

* Update dataset, evaluators and examples

* remove hub_content_name

* port bedrock model builder and add integ test (#1911)

* port bedrock model builder and add integ test

* Add unit test and notebook, port new version of bedrock model builder

* add docstring

* Mc deployment integ tests and example notebook (#1914)

* Updates for Model Customization

* Model Builder Base change

* Fix

* Support for Model Customization

* New Model Builder changes with Model Customization

* Fix

* Deployment Integ tests

* Tests

* Deployment NOtebook

* fix

* Update model hub name and encoding method (#1918)

* Update get_all method for Evaluator and Dataset

- Update Dataset.get_all() and added refresh()
- Update Evaluator.get_all() and added refresh()
- Added more tags in condition when importing hub content

* Update dataset, evaluators and examples

* remove hub_content_name

* Update hub name and hash

- Update model hub name to be AIRegistry-{accountId}-{region}
- Update hub name hash

* Add telemetry for model customization (#1920)

* add telemetry to model customization

* add telemetry to model customization 2

* small fix

* INteg tests for nova bedrock  (#1926)

* Bedrock Nova

* Unit tests for bedrock nova

* accept eula always (#1928)

Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>

* Add studio tags to trainer (#1927)

* Update get_all method for Evaluator and Dataset

- Update Dataset.get_all() and added refresh()
- Update Evaluator.get_all() and added refresh()
- Added more tags in condition when importing hub content

* Update dataset, evaluators and examples

* remove hub_content_name

* Update hub name and hash

- Update model hub name to be AIRegistry-{accountId}-{region}
- Update hub name hash

* Add studio tags to trainer

Tested nu checking training job creation inputs and unit tests

---------

Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>

* Finetuning classes: add accept_eula support & Add integ tests for sft, rlvr,rlaif, dpo trainers (#1915)

* Add/port finetuning interfaces to v3

* Add/port finetuning interfaces to v3

* Add base_trainer, addition to PR#1906

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Finetuning classes: Add accept_eula support

* Finetuning classes: Add accept_eula support

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

* feat: pipeline name fix + mlflow arn fix + jinja dep (#1931)

* changes for pipeline naming

* feat: pipeline name fix + mlflow arn fix + jinja dep

---------

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>

* AIRegistry: Add Support to user provided session and role (#1932)

* Add/port finetuning interfaces to v3

* Add/port finetuning interfaces to v3

* Add base_trainer, addition to PR#1906

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Finetuning classes: Add accept_eula support

* Finetuning classes: Add accept_eula support

* AIRegistry: Support user provided session and role

* AIRegistry: Support user provided session and role

* AIRegistry: Support user provided session and role

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

* format name for ai reg, stripped name of =, added domain-id tags for … (#1930)

* format name for ai reg, stripped name of =, added domain-id tags for datasets and evaluators, prefix changed t use full path including filename

* pulling metadata path since get_current_domain_id didnt work

* rmvd gamma endpoint

---------

Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>

* fixed train unit tests (#1933)

Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>

* Updating DataSet versioning pattern for parity with UI (#1934)

* fix dataset arn configuration in eval (#1935)

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>

* Master model customization v3 (#1936)

* Updating DataSet versioning pattern for parity with UI

* Updating default S3 bucket and prefix for datasets

* fix: custom metrics and pipeline name in lineage (#1937)

* remove mlflow config for base model eval only

* fix unit

* fix custom metrics and pipeline name in lineage

---------

Testing:
1. Added/Ran units.
2. Tested LLMAJ eval.

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>

* Integ tests for ModelBuilder (#1939)

* Bedrock Nova

* Unit tests for bedrock nova

* Integ test updates

* Test case updates

* fixing AIRegistry integration and unit tests (#1941)

* Update error messages based on feedback (#1938)

* feat: Inline MLFlow metrics with Serverless MLFlow App (#1942)

* Master model customization v3 (#1945)

* Bedrock Nova

* Unit tests for bedrock nova

* Integ test updates

* Test case updates

* Bedrock tests

* fix: update eval integ tests (#1946)

* update eval integ tests

* fix unit tests

---------

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>

* Finetuning classes: Add integ tests (#1947)

* Add/port finetuning interfaces to v3

* Add/port finetuning interfaces to v3

* Add base_trainer, addition to PR#1906

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Finetuning classes: Add accept_eula support

* Finetuning classes: Add accept_eula support

* AIRegistry: Support user provided session and role

* AIRegistry: Support user provided session and role

* AIRegistry: Support user provided session and role

* Finetuning classes: Add integration tests

* Merge conflicts

* Merge conflicts

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

* Workflow change to enable github actions (#1948)

* workflow change and test

* remove test

* Remove beta endpoint info, add region validation and tests  (#1950)

* Add/port finetuning interfaces to v3

* Add/port finetuning interfaces to v3

* Add base_trainer, addition to PR#1906

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Add integ tests for sft, rlvr,rlaif, dpo trainers

* Finetuning classes: Add accept_eula support

* Finetuning classes: Add accept_eula support

* AIRegistry: Support user provided session and role

* AIRegistry: Support user provided session and role

* AIRegistry: Support user provided session and role

* Finetuning classes: Add integration tests

* Merge conflicts

* Merge conflicts

* Remove beta endpoint details in sagemaker-core

* add region validation for models

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

---------

Co-authored-by: Molly He <mollyhe@amazon.com>
Co-authored-by: jam-jee <jamjee@amazon.com>
Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>
Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>
Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>
Co-authored-by: Gokul Anantha Narayanan <166456257+nargokul@users.noreply.github.com>
Co-authored-by: Mufaddal Rohawala <89424143+mufaddal-rohawala@users.noreply.github.com>
Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>
Co-authored-by: Zhaoqi <52220743+zhaoqizqwang@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 66.40316% with 170 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.53%. Comparing base (70e4036) to head (e692060).
⚠️ Report is 113 commits behind head on master.

Files with missing lines Patch % Lines
..._environment/test_bootstrap_runtime_environment.py 0.00% 54 Missing ⚠️
...me_environment/test_runtime_environment_manager.py 0.00% 35 Missing ⚠️
...emaker-core/tests/unit/remote_function/test_job.py 0.00% 21 Missing ⚠️
...iner_drivers/distributed_drivers/test_mpi_utils.py 0.00% 19 Missing ⚠️
...sts/unit/remote_function/test_job_comprehensive.py 0.00% 14 Missing ⚠️
...ction/runtime_environment/test_mpi_utils_remote.py 0.00% 13 Missing ⚠️
sagemaker-core/tests/unit/local/test_image.py 68.57% 11 Missing ⚠️
...nit/model_monitor/test_clarify_model_monitoring.py 86.66% 2 Missing ⚠️
...aker-core/tests/unit/helper/test_session_helper.py 98.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5358      +/-   ##
==========================================
+ Coverage   86.08%   86.53%   +0.45%     
==========================================
  Files         454      271     -183     
  Lines       44262    42586    -1676     
==========================================
- Hits        38104    36853    -1251     
+ Misses       6158     5733     -425     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rsareddy0329 and others added 5 commits December 3, 2025 12:30
Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>
Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>
Co-authored-by: Mufaddal Rohawala <mufi@amazon.com>
Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>
* skip resource unit test

* update reason message
* Fixing serve unit tests

* Bug fix for inference

---------

Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>
@rsareddy0329 rsareddy0329 merged commit 0443307 into master Dec 3, 2025
5 of 19 checks passed
@rsareddy0329 rsareddy0329 deleted the keynote3-v3 branch December 3, 2025 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants