Depth & width pruning with instruct models. #33

peremartra · 2025-10-24T10:57:56Z

peremartra
Oct 24, 2025
Maintainer

I've added two new notebooks to the pruning section that show how pruning affects Instruct models, meaning, prepared to function as chat, follow instructions, and maintain a conversation with the user.

Depth Pruning

From a llama-3.2-1B model with 16 transformer blocks, 2 are removed. Which represents a 14.76% loss of its size.

The model maintains the ability to function as an instruct model, but suffers a significant drop in the ability to generate coherent text, as demonstrated by the loss in the Lambada benchmark, and can be observed in the empirical test of generated text.

But it maintains a good score in other Benchmarks like BoolQ and arc_easy, which indicates that its reasoning capabilities have remained in a correct range.

Width pruning

In this case, 20% of the neurons from the intermediate MLP layers are removed. The number of transformer blocks is maintained, but a narrower model was achieved. The model loses 13.03% of its size. Just like in the previous example, the model maintains its ability to function as an Instruct model.

The biggest difference is that the model retains its ability to guess the next token and the reduction in the Lambada and OpenAI Lambada Benchmarks is much smaller, around 15%.

##Summary.
Both methods have been shown to work correctly with Instruct models. In both cases, specialy with depth pruning, I would recommend a knowledge recovery process, either via LoRA with a generalist dataset or KD from the base model.

P.S. the benchmark data is in the notebooks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Depth & width pruning with instruct models. #33

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Depth & width pruning with instruct models. #33

Uh oh!

peremartra Oct 24, 2025 Maintainer

Depth Pruning

Width pruning

Replies: 0 comments

peremartra
Oct 24, 2025
Maintainer