MS-BioGraphs MS



This datasets contains a edge-weighted graph that represents the similarity between protein sequences of the Metaclust dataset. The LAST sequence aligning algorithm ( has been used for matching the sequences and the graph is compressed in WebGraph format (

This dataset is the whole graph with 1.7 billion vertices and 2.5 trillion edges.

For more information about the dataset, its features, and downloading, please visit:

For more information about creation of dataset, please visit:

For sample code for loading and validating the dataset, please refer to:

The following link contains the other datasets of this family:
Date made available10 Aug 2023
PublisherQueen's University Belfast
Date of data production2022

Cite this