Skip to content

Can you list some popular HuggingFace datasets that we can used to set guidellm --data #505

@llc-kc

Description

@llc-kc

What is the URL, file, or UI containing proposed doc change
https://github.com/vllm-project/guidellm/blob/main/README.md

What is the current content or situation in question
GuideLLM supports HuggingFace datasets, local files, and synthetic data. This example loads the CNN DailyMail dataset from HuggingFace and maps the article column to prompts while using the summary token count column to determine output lengths.

guidellm benchmark \
  --target http://localhost:8000 \
  --data "hf:cnn_dailymail" \
  --data-args '{"prompt_column":"article","output_tokens_count_column":"summary_tokens"}'

What is the proposed change
Can you list some popular dataset that we can use, for example, used to calibrate MOE expert load balance. Or some guide to judge which types dataset we can use and which types of dataset can not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions