Mixture of experts-enabled parallel scheduling and processing for vehicular generative AI services

  • Gaochang Xie
  • , Zehui Xiong
  • , Renchao Xie*
  • , Xiumei Deng
  • , Song Guo
  • , Mohsen Guizani
  • , Zhu Han
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Foundation models (FMs) have revolutionized generative AI (GAI) lifecycle with their pre-trained intelligence capabilities. While the recent success of web-based models like GPT-4 has spurred interest in extending FMs to edge scenarios like the Internet of Vehicles (IoV), challenges such as data privacy, network congestion, and limited edge resource awareness hinder the direct application of cloud-based FMs. To address these challenges, this paper reconfigures the mixture of experts (MoE) architecture to deploy Transformer-based FM experts across vehicles for distributed GAI inference. Specifically, we propose an MoE-empowered vehicular FM system with two key innovations: a physical gating network that dynamically adapts to wireless environments, and a vehicle-to-vehicle (V2V) communication-based expert parallelism mechanism to enhance efficiency and resource utilization. Then, we formulate the communication, computation, and memory models for analyzing inference latency. Furthermore, to optimize the computational resource allocation for enhancing inference performance, we establish a Stackelberg game and propose the Gradient Ascent and Evolutionary Optimization-based Competitive Pricing and Allocation (GECPA) algorithm, which balances resource allocation and usage costs by combining the rapid convergence of gradient ascent with the broader exploration capability of evolutionary optimization. Simulation results demonstrate the superior parallel processing efficiency and reduced latency of the proposed MoE-based FM inference scheme. Compared with the best-performing benchmark algorithm, GECPA improves the average utility of infrastructure vehicles by up to 2.01 times, reduces inference latency by 16.58%, and increases the successful execution rate of GAI tasks by 10.66%, thereby achieving a better balance between efficiency and incentive compatibility in dynamic IoV environments.

Original languageEnglish
Number of pages18
JournalIEEE Transactions on Cognitive Communications and Networking
Early online date20 May 2025
DOIs
Publication statusEarly online date - 20 May 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Keywords

  • expert parallelism
  • foundation models (FMs)
  • Generative AI (GAI)
  • Internet of Vehicles (IoV)
  • mixture of experts (MoE)
  • resource allocation

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Mixture of experts-enabled parallel scheduling and processing for vehicular generative AI services'. Together they form a unique fingerprint.

Cite this