mirror of
https://github.com/wassname/adapters_as_hypotheses.git
synced 2026-06-27 16:44:10 +08:00
3800481a30
30 PEFT methods reframed as hypotheses about transformer geometry. Each entry: pseudocode, hypothesis, evidence, grade. All papers saved to docs/ (full text).
882 lines
112 KiB
Markdown
882 lines
112 KiB
Markdown
Title: 2405.17484v3.pdf
|
||
|
||
URL Source: https://arxiv.org/pdf/2405.17484
|
||
|
||
Published Time: Mon, 18 Nov 2024 01:23:30 GMT
|
||
|
||
Number of Pages: 34
|
||
|
||
Markdown Content:
|
||
## Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
|
||
|
||
Shen Yuan 1 Haotian Liu 1∗ Hongteng Xu 1,2†
|
||
|
||
> 1
|
||
|
||
Gaoling School of Artificial Intelligence, Renmin University of China
|
||
|
||
> 2
|
||
|
||
Beijing Key Laboratory of Big Data Management and Analysis Methods
|
||
|
||
{shenyuan721, haotianliu, hongtengxu}@ruc.edu.cn
|
||
|
||
Abstract
|
||
|
||
While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code of the experiments is available at https://github.com/ DaShenZi721/HRA , and the method has been merged into the PEFT package.
|
||
|
||
1 Introduction
|
||
|
||
In recent large foundation model competitions, “Scaling Laws” [ 22 , 18 , 39 ] motivate researchers to increase model size continuously, which brings significantly improved model capabilities in understanding, generation, reasoning, and generalization but with more and more unbearable model adaptation costs. For instance, the GPU memory for fine-tuning a LLaMA-65B model with 16bit precision exceeds 780GB [ 9]. The adaptation of image generative models (like the ControlNet in [ 60 ] did) may suffer the same issue when applying large vision foundation models as backbones. Consequently, fine-tuning large foundation models efficiently for adapting various downstream tasks has become a challenge in practice. To overcome the above challenge, Parameter-Efficient Fine-Tuning (PEFT) methods [ 53 ] provide promising solutions, which aim to reduce the trainable parameters and memory consumption of fine-tuning while maintaining even improving model adaptation performance. Among various PEFT methods, the adapter-based fine-tuning [ 20 , 37 ] attracts a lot because it only inserts limited trainable parameters into existing models during fine-tuning but without adding extra complexity or overhead in the inference phase. Currently, given a parameter matrix of a pre-trained model, i.e., W ∈ Rdout ×d,there are roughly two strategies implementing the adapter-based fine-tuning. The mainstream strategy is applying Low-Rank Adaptation (LoRA) [ 20 ] and its variants [ 9, 21 , 25 , 31 , 33 , 35 , 48 , 54 , 60 ,
|
||
|
||
> ∗
|
||
|
||
A part of this work was done when Haotian Liu was affiliated with the Beijing Institute of Technology.
|
||
|
||
> †
|
||
|
||
Corresponding author 38th Conference on Neural Information Processing Systems (NeurIPS 2024).
|
||
|
||
> arXiv:2405.17484v3 [cs.LG] 15 Nov 2024
|
||
|
||
<latexit sha1_base64="MobDUxOzzDZfhHAOBLBO1va0R4E=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQircuiG5cV7AOaUCbTSTt0MgnzEErob7hxoYhbf8adf+OkzUJbDwwczrmXe+aEKWdKu+63U9rY3NreKe9W9vYPDo+qxyddlRhJaIckPJH9ECvKmaAdzTSn/VRSHIec9sLpXe73nqhULBGPepbSIMZjwSJGsLaS78dYT8IoM/OhHFZrbt1dAK0TryA1KNAeVr/8UUJMTIUmHCs18NxUBxmWmhFO5xXfKJpiMsVjOrBU4JiqIFtknqMLq4xQlEj7hEYL9fdGhmOlZnFoJ/OMatXLxf+8gdHRTZAxkRpNBVkeigxHOkF5AWjEJCWazyzBRDKbFZEJlphoW1PFluCtfnmddK/qXqPeeLiutW6LOspwBudwCR40oQX30IYOEEjhGV7hzTHOi/PufCxHS06xcwp/4Hz+AIvhkgk=</latexit>
|
||
|
||
ur
|
||
|
||
<latexit sha1_base64="NtkTar0wxbOJ9f/2GlH11RAPyrY=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiRFqsuiG5cV7AOaUCbTSTt0MgnzEErob7hxoYhbf8adf+OkzUJbDwwczrmXe+aEKWdKu+63U9rY3NreKe9W9vYPDo+qxyddlRhJaIckPJH9ECvKmaAdzTSn/VRSHIec9sLpXe73nqhULBGPepbSIMZjwSJGsLaS78dYT8IoM/NhY1ituXV3AbROvILUoEB7WP3yRwkxMRWacKzUwHNTHWRYakY4nVd8o2iKyRSP6cBSgWOqgmyReY4urDJCUSLtExot1N8bGY6VmsWhncwzqlUvF//zBkZHN0HGRGo0FWR5KDIc6QTlBaARk5RoPrMEE8lsVkQmWGKibU0VW4K3+uV10m3UvWa9+XBVa90WdZThDM7hEjy4hhbcQxs6QCCFZ3iFN8c4L86787EcLTnFzin8gfP5Ayrhkck=</latexit>
|
||
|
||
u2
|
||
|
||
<latexit sha1_base64="IMa5FqgjKqnTUmGREAFj1JAptyI=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQircuiG5cV7AOaUCbTSTt0MgnzEErob7hxoYhbf8adf+OkzUJbDwwczrmXe+aEKWdKu+63U9rY3NreKe9W9vYPDo+qxyddlRhJaIckPJH9ECvKmaAdzTSn/VRSHIec9sLpXe73nqhULBGPepbSIMZjwSJGsLaS78dYT8IoM/OhN6zW3Lq7AFonXkFqUKA9rH75o4SYmApNOFZq4LmpDjIsNSOcziu+UTTFZIrHdGCpwDFVQbbIPEcXVhmhKJH2CY0W6u+NDMdKzeLQTuYZ1aqXi/95A6OjmyBjIjWaCrI8FBmOdILyAtCISUo0n1mCiWQ2KyITLDHRtqaKLcFb/fI66V7VvUa98XBda90WdZThDM7hEjxoQgvuoQ0dIJDCM7zCm2OcF+fd+ViOlpxi5xT+wPn8ASldkcg=</latexit>
|
||
|
||
u1
|
||
|
||
<latexit sha1_base64="kmjtBeKQ5PQMWGDQ6AOZBQ1+7v8=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69BIvgqSQi1WPRi8cK9gPaUDbbTbt2sxt2J0IJ/Q9ePCji1f/jzX/jts1BWx8MPN6bYWZemAhu0PO+ncLa+sbmVnG7tLO7t39QPjxqGZVqyppUCaU7ITFMcMmayFGwTqIZiUPB2uH4dua3n5g2XMkHnCQsiMlQ8ohTglZq9ZDHzPTLFa/qzeGuEj8nFcjR6Je/egNF05hJpIIY0/W9BIOMaORUsGmplxqWEDomQ9a1VBK7JMjm107dM6sM3EhpWxLdufp7IiOxMZM4tJ0xwZFZ9mbif143xeg6yLhMUmSSLhZFqXBRubPX3QHXjKKYWEKo5vZWl46IJhRtQCUbgr/88ippXVT9WrV2f1mp3+RxFOEETuEcfLiCOtxBA5pA4RGe4RXeHOW8OO/Ox6K14OQzx/AHzucPub2PPw==</latexit>
|
||
|
||
⇥ <latexit sha1_base64="kmjtBeKQ5PQMWGDQ6AOZBQ1+7v8=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69BIvgqSQi1WPRi8cK9gPaUDbbTbt2sxt2J0IJ/Q9ePCji1f/jzX/jts1BWx8MPN6bYWZemAhu0PO+ncLa+sbmVnG7tLO7t39QPjxqGZVqyppUCaU7ITFMcMmayFGwTqIZiUPB2uH4dua3n5g2XMkHnCQsiMlQ8ohTglZq9ZDHzPTLFa/qzeGuEj8nFcjR6Je/egNF05hJpIIY0/W9BIOMaORUsGmplxqWEDomQ9a1VBK7JMjm107dM6sM3EhpWxLdufp7IiOxMZM4tJ0xwZFZ9mbif143xeg6yLhMUmSSLhZFqXBRubPX3QHXjKKYWEKo5vZWl46IJhRtQCUbgr/88ippXVT9WrV2f1mp3+RxFOEETuEcfLiCOtxBA5pA4RGe4RXeHOW8OO/Ox6K14OQzx/AHzucPub2PPw==</latexit>
|
||
|
||
⇥ ……
|
||
|
||
<latexit sha1_base64="IMa5FqgjKqnTUmGREAFj1JAptyI=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQircuiG5cV7AOaUCbTSTt0MgnzEErob7hxoYhbf8adf+OkzUJbDwwczrmXe+aEKWdKu+63U9rY3NreKe9W9vYPDo+qxyddlRhJaIckPJH9ECvKmaAdzTSn/VRSHIec9sLpXe73nqhULBGPepbSIMZjwSJGsLaS78dYT8IoM/OhN6zW3Lq7AFonXkFqUKA9rH75o4SYmApNOFZq4LmpDjIsNSOcziu+UTTFZIrHdGCpwDFVQbbIPEcXVhmhKJH2CY0W6u+NDMdKzeLQTuYZ1aqXi/95A6OjmyBjIjWaCrI8FBmOdILyAtCISUo0n1mCiWQ2KyITLDHRtqaKLcFb/fI66V7VvUa98XBda90WdZThDM7hEjxoQgvuoQ0dIJDCM7zCm2OcF+fd+ViOlpxi5xT+wPn8ASldkcg=</latexit>
|
||
|
||
u1
|
||
|
||
<latexit sha1_base64="NtkTar0wxbOJ9f/2GlH11RAPyrY=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiRFqsuiG5cV7AOaUCbTSTt0MgnzEErob7hxoYhbf8adf+OkzUJbDwwczrmXe+aEKWdKu+63U9rY3NreKe9W9vYPDo+qxyddlRhJaIckPJH9ECvKmaAdzTSn/VRSHIec9sLpXe73nqhULBGPepbSIMZjwSJGsLaS78dYT8IoM/NhY1ituXV3AbROvILUoEB7WP3yRwkxMRWacKzUwHNTHWRYakY4nVd8o2iKyRSP6cBSgWOqgmyReY4urDJCUSLtExot1N8bGY6VmsWhncwzqlUvF//zBkZHN0HGRGo0FWR5KDIc6QTlBaARk5RoPrMEE8lsVkQmWGKibU0VW4K3+uV10m3UvWa9+XBVa90WdZThDM7hEjy4hhbcQxs6QCCFZ3iFN8c4L86787EcLTnFzin8gfP5Ayrhkck=</latexit>
|
||
|
||
u2<latexit sha1_base64="MobDUxOzzDZfhHAOBLBO1va0R4E=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQircuiG5cV7AOaUCbTSTt0MgnzEErob7hxoYhbf8adf+OkzUJbDwwczrmXe+aEKWdKu+63U9rY3NreKe9W9vYPDo+qxyddlRhJaIckPJH9ECvKmaAdzTSn/VRSHIec9sLpXe73nqhULBGPepbSIMZjwSJGsLaS78dYT8IoM/OhHFZrbt1dAK0TryA1KNAeVr/8UUJMTIUmHCs18NxUBxmWmhFO5xXfKJpiMsVjOrBU4JiqIFtknqMLq4xQlEj7hEYL9fdGhmOlZnFoJ/OMatXLxf+8gdHRTZAxkRpNBVkeigxHOkF5AWjEJCWazyzBRDKbFZEJlphoW1PFluCtfnmddK/qXqPeeLiutW6LOspwBudwCR40oQX30IYOEEjhGV7hzTHOi/PufCxHS06xcwp/4Hz+AIvhkgk=</latexit>
|
||
|
||
ur
|
||
|
||
<latexit sha1_base64="BXnqRcE0kw3IeLJf7k/AJ6eE1iE=">AAACFHicbZDLSsNAFIYn9VbrLerSzWARBLEkRWqXBTe6q2Av0MYwmU7aoZNJmJkIJeQh3Pgqblwo4taFO9/GSRuhtv4w8PGfc5hzfi9iVCrL+jYKK6tr6xvFzdLW9s7unrl/0JZhLDBp4ZCFoushSRjlpKWoYqQbCYICj5GON77K6p0HIiQN+Z2aRMQJ0JBTn2KktOWaZ/0AqZHnJzfpefWX49S15/k+6aswSl2zbFWsqeAy2DmUQa6ma371ByGOA8IVZkjKnm1FykmQUBQzkpb6sSQRwmM0JD2NHAVEOsn0qBSeaGcA/VDoxxWcuvMTCQqknASe7sxWlYu1zPyv1ouVX3cSyqNYEY5nH/kxgyqEWUJwQAXBik00ICyo3hXiERIIK51jSYdgL568DO1qxa5VarcX5UY9j6MIjsAxOAU2uAQNcA2aoAUweATP4BW8GU/Gi/FufMxaC0Y+cwj+yPj8AfK1n2M=</latexit>
|
||
|
||
I |