TY - JOUR
T1 - MMFed: a multimodal federated learning framework for heterogeneous devices
AU - Wang, Gang
AU - Zhang, Yanfeng
AU - Ying, Chenhao
AU - Zhang, Qinnan
AU - Xiong, Zehui
AU - Wang, Jiakang
AU - Yu, Ge
PY - 2025/7/2
Y1 - 2025/7/2
N2 - Existing federated learning frameworks are primarily designed for single-modal data. However, real-world scenarios require processing multi-modal data on heterogeneous devices. The gap between existing methods and real-world scenarios presents challenges in processing multimodal data on heterogeneous devices, significantly impacting model training efficiency. To address these issues, we propose a multimodal federated learning framework, which integrates multimodal algorithms with a semi-synchronous training method. The multimodal algorithm trains local autoencoders on different data modalities. By leveraging the similarity of encodings across different modalities with the same data labels, we further train and aggregate these local autoencoders into a global autoencoder, which is then deployed on the blockchain to perform downstream classification tasks. In the semi-synchronous training method, each device updates its parameters independently during a round. At the end of each round, a global aggregation combines the updates from devices. We conduct an empirical evaluation of our framework on various multimodal datasets, including Opportunity (Opp) Challenge, mHealth, and UR Fall Detection datasets. Experimental results demonstrate that our federated learning framework, outperforms the state-of-the-art multimodal frameworks on three multimodal datasets, achieving an average accuracy improvement of 9.07%. Furthermore, in terms of training speed, MMFed is obviously superior to synchronization strategies when it is extended to a large number of clients.
AB - Existing federated learning frameworks are primarily designed for single-modal data. However, real-world scenarios require processing multi-modal data on heterogeneous devices. The gap between existing methods and real-world scenarios presents challenges in processing multimodal data on heterogeneous devices, significantly impacting model training efficiency. To address these issues, we propose a multimodal federated learning framework, which integrates multimodal algorithms with a semi-synchronous training method. The multimodal algorithm trains local autoencoders on different data modalities. By leveraging the similarity of encodings across different modalities with the same data labels, we further train and aggregate these local autoencoders into a global autoencoder, which is then deployed on the blockchain to perform downstream classification tasks. In the semi-synchronous training method, each device updates its parameters independently during a round. At the end of each round, a global aggregation combines the updates from devices. We conduct an empirical evaluation of our framework on various multimodal datasets, including Opportunity (Opp) Challenge, mHealth, and UR Fall Detection datasets. Experimental results demonstrate that our federated learning framework, outperforms the state-of-the-art multimodal frameworks on three multimodal datasets, achieving an average accuracy improvement of 9.07%. Furthermore, in terms of training speed, MMFed is obviously superior to synchronization strategies when it is extended to a large number of clients.
U2 - 10.1109/JIOT.2025.3579858
DO - 10.1109/JIOT.2025.3579858
M3 - Article
SN - 2327-4662
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
ER -