model customization v3 release PR #5358

rsareddy0329 · 2025-12-03T18:39:25Z

Issue #, if available:

Description of changes:

Fine-tuning SDK: SFT, RLVR, and RLAIF techniques with standardized parameter design
AIRegistry Integration: Added CRUD operations for datasets and evaluators
Enhanced Training Experience: Implemented MLFlow metrics tracking and deployment workflows

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

* prevent cmd injection in RuntimeEnvironmentManager * fixed unit tests --------- Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>

* sagemaker-core update, port sft-trainer and ai registry (#1895) * sagemaker core update with MC service.json, SFT notebook flow succeeded before modelpackage * remove workaround, modelpackage issue persists * remove ai registry port * feat: AIRegistry SDK Implementation (#1893) * remove sft trainer port, fix ModelPackage issue (#1901) * Add/port finetuning interfaces to v3 (#1906) * Add/port finetuning interfaces to v3 * Add/port finetuning interfaces to v3 --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> * ported evaluation from v2->v3. pending example notebooks results (#1902) * v2->v3 porting for eval * bring back deleted needed files * unit tests --------- Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com> * Mc deployment (#1909) * Updates for Model Customization * Model Builder Base change * Fix * Support for Model Customization * New Model Builder changes with Model Customization * Fix * Add base_trainer, addition to PR#1906 (#1910) * Add/port finetuning interfaces to v3 * Add/port finetuning interfaces to v3 * Add base_trainer, addition to PR#1906 --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> * AIRegistry Integration Tests (#1908) * feat: Trainer wait with MLFlow metrics (#1907) * feat: update eval integs (#1912) Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> * Update get_all method for Evaluator and Dataset (#1913) * Update get_all method for Evaluator and Dataset - Update Dataset.get_all() and added refresh() - Update Evaluator.get_all() and added refresh() - Added more tags in condition when importing hub content * Update dataset, evaluators and examples * remove hub_content_name * port bedrock model builder and add integ test (#1911) * port bedrock model builder and add integ test * Add unit test and notebook, port new version of bedrock model builder * add docstring * Mc deployment integ tests and example notebook (#1914) * Updates for Model Customization * Model Builder Base change * Fix * Support for Model Customization * New Model Builder changes with Model Customization * Fix * Deployment Integ tests * Tests * Deployment NOtebook * fix * Update model hub name and encoding method (#1918) * Update get_all method for Evaluator and Dataset - Update Dataset.get_all() and added refresh() - Update Evaluator.get_all() and added refresh() - Added more tags in condition when importing hub content * Update dataset, evaluators and examples * remove hub_content_name * Update hub name and hash - Update model hub name to be AIRegistry-{accountId}-{region} - Update hub name hash * Add telemetry for model customization (#1920) * add telemetry to model customization * add telemetry to model customization 2 * small fix * INteg tests for nova bedrock (#1926) * Bedrock Nova * Unit tests for bedrock nova * accept eula always (#1928) Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com> * Add studio tags to trainer (#1927) * Update get_all method for Evaluator and Dataset - Update Dataset.get_all() and added refresh() - Update Evaluator.get_all() and added refresh() - Added more tags in condition when importing hub content * Update dataset, evaluators and examples * remove hub_content_name * Update hub name and hash - Update model hub name to be AIRegistry-{accountId}-{region} - Update hub name hash * Add studio tags to trainer Tested nu checking training job creation inputs and unit tests --------- Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com> * Finetuning classes: add accept_eula support & Add integ tests for sft, rlvr,rlaif, dpo trainers (#1915) * Add/port finetuning interfaces to v3 * Add/port finetuning interfaces to v3 * Add base_trainer, addition to PR#1906 * Add integ tests for sft, rlvr,rlaif, dpo trainers * Add integ tests for sft, rlvr,rlaif, dpo trainers * Finetuning classes: Add accept_eula support * Finetuning classes: Add accept_eula support --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> * feat: pipeline name fix + mlflow arn fix + jinja dep (#1931) * changes for pipeline naming * feat: pipeline name fix + mlflow arn fix + jinja dep --------- Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> * AIRegistry: Add Support to user provided session and role (#1932) * Add/port finetuning interfaces to v3 * Add/port finetuning interfaces to v3 * Add base_trainer, addition to PR#1906 * Add integ tests for sft, rlvr,rlaif, dpo trainers * Add integ tests for sft, rlvr,rlaif, dpo trainers * Finetuning classes: Add accept_eula support * Finetuning classes: Add accept_eula support * AIRegistry: Support user provided session and role * AIRegistry: Support user provided session and role * AIRegistry: Support user provided session and role --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> * format name for ai reg, stripped name of =, added domain-id tags for … (#1930) * format name for ai reg, stripped name of =, added domain-id tags for datasets and evaluators, prefix changed t use full path including filename * pulling metadata path since get_current_domain_id didnt work * rmvd gamma endpoint --------- Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com> * fixed train unit tests (#1933) Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com> * Updating DataSet versioning pattern for parity with UI (#1934) * fix dataset arn configuration in eval (#1935) Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> * Master model customization v3 (#1936) * Updating DataSet versioning pattern for parity with UI * Updating default S3 bucket and prefix for datasets * fix: custom metrics and pipeline name in lineage (#1937) * remove mlflow config for base model eval only * fix unit * fix custom metrics and pipeline name in lineage --------- Testing: 1. Added/Ran units. 2. Tested LLMAJ eval. Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> * Integ tests for ModelBuilder (#1939) * Bedrock Nova * Unit tests for bedrock nova * Integ test updates * Test case updates * fixing AIRegistry integration and unit tests (#1941) * Update error messages based on feedback (#1938) * feat: Inline MLFlow metrics with Serverless MLFlow App (#1942) * Master model customization v3 (#1945) * Bedrock Nova * Unit tests for bedrock nova * Integ test updates * Test case updates * Bedrock tests * fix: update eval integ tests (#1946) * update eval integ tests * fix unit tests --------- Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> * Finetuning classes: Add integ tests (#1947) * Add/port finetuning interfaces to v3 * Add/port finetuning interfaces to v3 * Add base_trainer, addition to PR#1906 * Add integ tests for sft, rlvr,rlaif, dpo trainers * Add integ tests for sft, rlvr,rlaif, dpo trainers * Finetuning classes: Add accept_eula support * Finetuning classes: Add accept_eula support * AIRegistry: Support user provided session and role * AIRegistry: Support user provided session and role * AIRegistry: Support user provided session and role * Finetuning classes: Add integration tests * Merge conflicts * Merge conflicts --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> * Workflow change to enable github actions (#1948) * workflow change and test * remove test * Remove beta endpoint info, add region validation and tests (#1950) * Add/port finetuning interfaces to v3 * Add/port finetuning interfaces to v3 * Add base_trainer, addition to PR#1906 * Add integ tests for sft, rlvr,rlaif, dpo trainers * Add integ tests for sft, rlvr,rlaif, dpo trainers * Finetuning classes: Add accept_eula support * Finetuning classes: Add accept_eula support * AIRegistry: Support user provided session and role * AIRegistry: Support user provided session and role * AIRegistry: Support user provided session and role * Finetuning classes: Add integration tests * Merge conflicts * Merge conflicts * Remove beta endpoint details in sagemaker-core * add region validation for models --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> --------- Co-authored-by: Molly He <mollyhe@amazon.com> Co-authored-by: jam-jee <jamjee@amazon.com> Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com> Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com> Co-authored-by: Gokul Anantha Narayanan <166456257+nargokul@users.noreply.github.com> Co-authored-by: Mufaddal Rohawala <89424143+mufaddal-rohawala@users.noreply.github.com> Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> Co-authored-by: Zhaoqi <52220743+zhaoqizqwang@users.noreply.github.com>

codecov · 2025-12-03T19:57:51Z

Codecov Report

❌ Patch coverage is 66.40316% with 170 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.53%. Comparing base (70e4036) to head (e692060).
⚠️ Report is 113 commits behind head on master.

Files with missing lines	Patch %	Lines
..._environment/test_bootstrap_runtime_environment.py	0.00%	54 Missing ⚠️
...me_environment/test_runtime_environment_manager.py	0.00%	35 Missing ⚠️
...emaker-core/tests/unit/remote_function/test_job.py	0.00%	21 Missing ⚠️
...iner_drivers/distributed_drivers/test_mpi_utils.py	0.00%	19 Missing ⚠️
...sts/unit/remote_function/test_job_comprehensive.py	0.00%	14 Missing ⚠️
...ction/runtime_environment/test_mpi_utils_remote.py	0.00%	13 Missing ⚠️
sagemaker-core/tests/unit/local/test_image.py	68.57%	11 Missing ⚠️
...nit/model_monitor/test_clarify_model_monitoring.py	86.66%	2 Missing ⚠️
...aker-core/tests/unit/helper/test_session_helper.py	98.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5358      +/-   ##
==========================================
+ Coverage   86.08%   86.53%   +0.45%     
==========================================
  Files         454      271     -183     
  Lines       44262    42586    -1676     
==========================================
- Hits        38104    36853    -1251     
+ Misses       6158     5733     -425

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>

* skip resource unit test * update reason message

* Fixing serve unit tests * Bug fix for inference --------- Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>

mohamedzeidan2021 and others added 4 commits December 2, 2025 17:45

Runtime env bug fix (#1943)

cc4d91b

* prevent cmd injection in RuntimeEnvironmentManager * fixed unit tests --------- Co-authored-by: Mohamed Zeidan <zeidmo@amazon.com>

Merge branch 'master' into keynote3-v3

f245f22

Merge branch 'master' into keynote3-v3

1af5881

mollyheamazon temporarily deployed to auto-approve December 3, 2025 18:56 — with GitHub Actions Inactive

Merge branch 'master' into keynote3-v3

e692060

rsareddy0329 temporarily deployed to auto-approve December 3, 2025 19:15 — with GitHub Actions Inactive

rsareddy0329 and others added 5 commits December 3, 2025 12:30

Update CHANGELOG.md (#1952)

119cb60

remove mlflow app from eval integs (#1951)

3014918

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>

address merge conflicts

ae0c138

Cleaning up notebooks and adding mlflow dependencies (#1954)

7b777bb

remove mlflow app from eval integs (#1956)

e95e888

Co-authored-by: Mufaddal Rohawala <mufi@amazon.com> Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>

rsareddy0329 temporarily deployed to auto-approve December 3, 2025 20:32 — with GitHub Actions Inactive

Merge branch 'master' into keynote3-v3

3045c64

mollyheamazon temporarily deployed to auto-approve December 3, 2025 20:41 — with GitHub Actions Inactive

mollyheamazon temporarily deployed to auto-approve December 3, 2025 20:49 — with GitHub Actions Inactive

skip resource unit test (#5360)

82c83e1

* skip resource unit test * update reason message

zhaoqizqwang temporarily deployed to auto-approve December 3, 2025 20:50 — with GitHub Actions Inactive

Updating Trainer wait loop refresh time (#5361)

4bbca9c

jam-jee temporarily deployed to auto-approve December 3, 2025 20:57 — with GitHub Actions Inactive

fix: skip SFT and RLVR nova tests other than IAD

2445ce2

rsareddy0329 temporarily deployed to auto-approve December 3, 2025 21:39 — with GitHub Actions Inactive

Fixing serve unit tests (#5362)

8a803f7

nargokul temporarily deployed to auto-approve December 3, 2025 21:40 — with GitHub Actions Inactive

Small bug fix for serve integration test (#5363)

e75da02

* Fixing serve unit tests * Bug fix for inference --------- Co-authored-by: rsareddy0329 <rsareddy0329@gmail.com>

rsareddy0329 temporarily deployed to auto-approve December 3, 2025 22:06 — with GitHub Actions Inactive

mufaddal-rohawala approved these changes Dec 3, 2025

View reviewed changes

jam-jee approved these changes Dec 3, 2025

View reviewed changes

rsareddy0329 merged commit 0443307 into master Dec 3, 2025
5 of 19 checks passed

rsareddy0329 deleted the keynote3-v3 branch December 3, 2025 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model customization v3 release PR #5358

model customization v3 release PR #5358

Uh oh!

rsareddy0329 commented Dec 3, 2025

Uh oh!

codecov bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

model customization v3 release PR #5358

model customization v3 release PR #5358

Uh oh!

Conversation

rsareddy0329 commented Dec 3, 2025

Uh oh!

codecov bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

codecov bot commented Dec 3, 2025 •

edited

Loading