MMFed: a multimodal federated learning framework for heterogeneous devices

  • Gang Wang
  • , Yanfeng Zhang
  • , Chenhao Ying
  • , Qinnan Zhang
  • , Zehui Xiong
  • , Jiakang Wang
  • , Ge Yu

Research output: Contribution to journalArticlepeer-review

Abstract

Existing federated learning frameworks are primarily designed for single-modal data. However, real-world scenarios require processing multi-modal data on heterogeneous devices. The gap between existing methods and real-world scenarios presents challenges in processing multimodal data on heterogeneous devices, significantly impacting model training efficiency. To address these issues, we propose a multimodal federated learning framework, which integrates multimodal algorithms with a semi-synchronous training method. The multimodal algorithm trains local autoencoders on different data modalities. By leveraging the similarity of encodings across different modalities with the same data labels, we further train and aggregate these local autoencoders into a global autoencoder, which is then deployed on the blockchain to perform downstream classification tasks. In the semi-synchronous training method, each device updates its parameters independently during a round. At the end of each round, a global aggregation combines the updates from devices. We conduct an empirical evaluation of our framework on various multimodal datasets, including Opportunity (Opp) Challenge, mHealth, and UR Fall Detection datasets. Experimental results demonstrate that our federated learning framework, outperforms the state-of-the-art multimodal frameworks on three multimodal datasets, achieving an average accuracy improvement of 9.07%. Furthermore, in terms of training speed, MMFed is obviously superior to synchronization strategies when it is extended to a large number of clients.
Original languageEnglish
JournalIEEE Internet of Things Journal
Early online date02 Jul 2025
DOIs
Publication statusEarly online date - 02 Jul 2025
Externally publishedYes

Fingerprint

Dive into the research topics of 'MMFed: a multimodal federated learning framework for heterogeneous devices'. Together they form a unique fingerprint.

Cite this