Graph Dataset Partitioning Methods

ali@fuzzywireless.com
Mar 4, 2022
2 min read

Graph data is defined as model of data network with interacting and relational nodes, like social networks, web graphs etc. (Chen, Weng, He, Choi, & Yang, 2014). The massive interconnection and relation of nodes make the structure very complex, poses significant storage and communication challenges and issues of scalability. Some the popular cloud-based graph processing platforms are Pregel, Pegasus, Hadi, Surfer, Trinity and graphLab (Chen et al., 2014).

Uneven bandwidth between several nodes of a cloud poses serious performance challenges due to massive scale of distributed environment (Chen et al., 2014). One of the reason is the network environment, which is influenced by network switches as well as network adaptors used to connect machines. For instance, Amazon cloud has been found with varying network bandwidths across machine pairs (book). Another reason for uneven bandwidth is the virtualization where users do not have administrative rights to physical machines, which why several tasks started to compete for network bandwidth leading to uneven bandwidth (Chen et al., 2014).

Graph processing on cloud with distributed nodes connected using varying network bandwidth can significantly hamper the processing performance (Chen et al., 2014). To improve performance, the idea is to partition, store and process based on number of cross-partition edges so that large number of cross partition edges are stored in the machine with high network bandwidths to facilitate fast data transfer. There are several methods, such as:

Machine Graph – without knowledge of network topology, 8MB data chunks are sent and measured to compute network bandwidth consisting of N virtual machines resulting in a machine graph with edge weighted as per network bandwidth between two nodes.

Partition Sketch – an ideal partition sketch balances the partitioning time and quality by minimizing cross-partition edges.

Multilevel Graph Partitioning – makes use of ideal partition sketch and machine graph (Chen et al., 2014).

Reference:

Chen, R., Weng, X., He, B., Choi, B., & Yang, M. (2014). Network performance aware graph partitioning for large graph processing systems in the cloud. In S. Sakr, & M. M. Gaber (Eds.), Large scale and big data: Processing and management(pp. 229-254). Boca Raton, FL: CRC Press.

Graph Dataset Partitioning Methods

Recent Posts

Comments