Skip to main navigation Skip to search Skip to main content

The ups and downs of large language model inference with vocabulary trimming by language heuristics

  • Nikolay Bogoychev
  • , Pinzhen Chen
  • , Barry Haddow
  • , Alexandra Birch

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deploying large language models (LLMs) encounters challenges due to intensive computational and memory requirements. Our research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. While such modifications have been proven effective in tasks like machine translation, tailoring them to LLMs demands specific modifications given the diverse nature of LLM applications. We apply two language heuristics to trim the full vocabulary---Unicode-based script filtering and corpus-based selection---to different LLM families and sizes. The methods are straightforward, interpretable, and easy to implement. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed. Yet, we reveal the limitations of these methods in that they do not perform consistently well for each language with diminishing returns in larger models.
Original languageEnglish
Title of host publicationProceedings of the Fifth Workshop on Insights from Negative Results in NLP
EditorsShabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
Place of PublicationMexico City, Mexico
PublisherAssociation for Computational Linguistics
Pages148-153
Number of pages6
ISBN (Electronic)9798891761025
DOIs
Publication statusPublished - 01 Jun 2024
Externally publishedYes

Fingerprint

Dive into the research topics of 'The ups and downs of large language model inference with vocabulary trimming by language heuristics'. Together they form a unique fingerprint.

Cite this