When training a large matrix M with size WxH parameters is expensive, instead take the matrix into the multiplication of 2 smaller matrices. For example: matrix A is in size of (Wx3) and matrix B is in size of (3xH). And let A * B = M to give back a matrix of WxH dimensions. Since W * 3 + 3 * H < W *H, less amount of parameters are required.
This technique is mentioned in both of the following videos:
https://www.coursera.org/learn/generative-ai-with-llms/lecture/NZOVw/peft-techniques-1-lora
No comments:
Post a Comment