Pivotal Token Search
-
Updated
Dec 3, 2025 - Python
Pivotal Token Search
A Survey of Direct Preference Optimization (DPO)
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025
[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
RankPO: Rank Preference Optimization
The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.
Evaluate how LLaMA 3.1 8B handles paraphrased adversarial prompts targeting refusal behavior.
[CC 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach
Experiments, and how-to guide for the lecture "Large language models for Scientometrics"
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
[ICML 2025 Workshop FM4BS] AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.
This repository explores two key approaches to fine-tuning large language models — Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) — to align model behavior with human intent and task objectives.
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Fine-tune a model that generates exams based on lecture materials.
Add a description, image, and links to the direct-preference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the direct-preference-optimization topic, visit your repo's landing page and select "manage topics."