Abstract
Deploying large language models (LLMs) encounters challenges due to intensive computational and memory requirements. Our research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. While such modifications have been proven effective in tasks like machine translation, tailoring them to LLMs demands specific modifications given the diverse nature of LLM applications. We apply two language heuristics to trim the full vocabulary---Unicode-based script filtering and corpus-based selection---to different LLM families and sizes. The methods are straightforward, interpretable, and easy to implement. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed. Yet, we reveal the limitations of these methods in that they do not perform consistently well for each language with diminishing returns in larger models.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Fifth Workshop on Insights from Negative Results in NLP |
| Editors | Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky |
| Place of Publication | Mexico City, Mexico |
| Publisher | Association for Computational Linguistics |
| Pages | 148-153 |
| Number of pages | 6 |
| ISBN (Electronic) | 9798891761025 |
| DOIs | |
| Publication status | Published - 01 Jun 2024 |
| Externally published | Yes |
Fingerprint
Dive into the research topics of 'The ups and downs of large language model inference with vocabulary trimming by language heuristics'. Together they form a unique fingerprint.Prizes
-
Best Paper Award at the Fifth Workshop on Insights from Negative Results in NLP
Bogoychev, N. (Recipient), Chen, P. (Recipient), Haddow, B. (Recipient) & Birch, A. (Recipient), 2024
Prize: Prize (including medals and awards)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver