Depth & width pruning with instruct models. #33
peremartra
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've added two new notebooks to the pruning section that show how pruning affects Instruct models, meaning, prepared to function as chat, follow instructions, and maintain a conversation with the user.
Depth Pruning
From a llama-3.2-1B model with 16 transformer blocks, 2 are removed. Which represents a 14.76% loss of its size.
The model maintains the ability to function as an instruct model, but suffers a significant drop in the ability to generate coherent text, as demonstrated by the loss in the Lambada benchmark, and can be observed in the empirical test of generated text.
But it maintains a good score in other Benchmarks like BoolQ and arc_easy, which indicates that its reasoning capabilities have remained in a correct range.
Width pruning
In this case, 20% of the neurons from the intermediate MLP layers are removed. The number of transformer blocks is maintained, but a narrower model was achieved. The model loses 13.03% of its size. Just like in the previous example, the model maintains its ability to function as an Instruct model.
The biggest difference is that the model retains its ability to guess the next token and the reduction in the Lambada and OpenAI Lambada Benchmarks is much smaller, around 15%.
##Summary.
Both methods have been shown to work correctly with Instruct models. In both cases, specialy with depth pruning, I would recommend a knowledge recovery process, either via LoRA with a generalist dataset or KD from the base model.
P.S. the benchmark data is in the notebooks.
Beta Was this translation helpful? Give feedback.
All reactions