The advent of LLMs such as GPT3, ChatGPT, and GPT4 has brought about a new paradigm shift in the world of AI. However, these models are currently not open-source, and users can only access them through user interfaces or APIs. This lack of access to the source code of these LLMs has impeded innovation and progress in the state-of-the-art.
Recent development of several open-access LLMs, such as LLaMA, StableLM, and BLOOM, has accelerated research, but the enormous size of these models discourages their adaptation to specific downstream tasks through training on task-specific datasets, making their usage limited in production.
To address this limitation, adapter-based parameter-efficient fine-tuning (PEFT), such as LoRA, BitFit, S-Adapter, and P-Adapter, has emerged as one of the most promising topics. PEFT only requires the fine-tuning of a few external parameters instead of the entire LLMs, achieving comparable or even better performance. Additionally, performing knowledge distillation from an ensemble of teacher LLMs to guide smaller student LLMs further boosts performance, making LLMs applicable, especially in limited-resource and high-throughput scenarios.