LLM Finetuning: LoRA Learns Less and Forgets Less

@SudhirNakka07|December 30, 2024 (9m ago)1 views

Problem Statement

While general purpose LLMs are becoming widely popular for day-to-day tasks, they are also becoming increasingly difficult to deploy in real-world task-oriented scenarios. This makes it difficult to deploy LLMs in real-world scenarios, where they must be adapted to meet specific demands. Target trained LLMs provide lower computational costs and memory requirements and allow edge deployments where large computational nodes are not available while achieving comparable performance. Training and fine-tuning LLMs on large datasets is expensive, time-consuming and a non-iterative approach because of the involved computational/time constraints.

Advanced research in LLM fine-tuning is a key area of interest for AI researchers and practitioners. LoRA is one such technique that allows for efficient adaptation of large language models.

What is LoRA?

LoRA is a fine-tuning technique that updates only low-rank components of pre-trained models instead of all parameters. This dramatically reduces the number of trainable parameters and lowers computational cost, allowing for efficient adaptation of very large language models. Additionally, LoRA stores these low-rank updates as separate adapter modules instead of overwriting the base model weights.

Advantages:

Significantly reduces memory requirements and computational cost during fine-tuning.
Requires training only a tiny fraction (often less than 1%) of the original parameters.
Allows fast adaptation of large models to specific downstream tasks without needing expensive full fine-tuning.
Preserves the original model's knowledge better by avoiding overwriting the base weights, thus reducing catastrophic forgetting.
Enables creating multiple specialized model adapters efficiently and modularly.

Key Findings from the Paper

1. Learning Efficiency and Knowledge Acquisition:

The study shows that LoRA tends to learn less new knowledge compared to full finetuning, especially for tasks requiring significant deviation from the original training data. For example, in programming tasks where models must grasp new coding patterns or APIs, full finetuning outperforms LoRA in absorbing this novel information.

2. Retention and Forgetting

On the flip side, LoRA demonstrates better retention of previously learned knowledge. While full finetuning can cause the model to “forget” some of its original capabilities (a phenomenon known as catastrophic forgetting), LoRA’s updates being isolated help preserve the base model’s skill set. This effect is particularly evident in coding tasks, where retaining foundational programming knowledge is crucial.

3. Task-Dependent Trade-Offs

The gap between LoRA and full finetuning narrows for instruction tuning and mathematical reasoning tasks—domains closer to the model’s existing pretraining distribution. This suggests that LoRA’s learning limitations are more pronounced when the finetuning task diverges from known data.

4. Practical Efficiency

Because of its parameter efficiency and modularity, LoRA is popular in real-world scenarios constrained by memory or compute. This paper confirms that despite learning less new information, its lower forgetting makes it a valuable method, especially when combined with other finetuning strategies.

Implications for LLM Finetuning

This research underscores a fundamental trade-off:

Full finetuning adapts the model more aggressively, improving performance on new or distant tasks but at the cost of forgetting previously learned skills.
LoRA finetuning provides a gentler learning curve, better preserving existing knowledge but potentially limiting how much new information is learned.

The authors suggest hybrid approaches that combine full finetuning for core knowledge updates with LoRA-based adapters for specialization and task-specific tweaks. Such combinations could maximize benefits while mitigating respective downsides.

Conclusion

LoRA Learns Less and Forgets Less provides a rigorous empirical foundation for understanding the capabilities and limitations of LoRA finetuning. For AI researchers and practitioners, balancing these trade-offs is key to deploying scalable, adaptable, and memory-efficient LLMs.

As LLMs continue evolving, insights from this paper will help shape finetuning strategies that optimize both knowledge retention and acquisition according to task demands and computational budgets.

References [1][2]

1. ^ Biderman et al., “LoRA Learns Less and Forgets Less,” 2024. Empirical study on finetuning strategies for LLMs.

2. ^ Sebastian Raschka, Noteworthy LLM Research Papers of 2024, May 2025. Summary and analysis of LoRA paper and other key LLM research.