Imperial College London

ProfessorJulieMcCann

Faculty of EngineeringDepartment of Computing

Vice-Dean (Research) for the Faculty of Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8375j.mccann Website

 
 
//

Assistant

 

Miss Teresa Ng +44 (0)20 7594 8300

 
//

Location

 

260ACE ExtensionSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Yu:2019:10.1007/s00778-018-0536-3,
author = {Yu, W and Lin, X and Zhang, W and Pei, J and McCann, JA},
doi = {10.1007/s00778-018-0536-3},
journal = {VLDB Journal},
pages = {401--426},
title = {SimRank*: effective and scalable pairwise similarity search based on graph topology},
url = {http://dx.doi.org/10.1007/s00778-018-0536-3},
volume = {28},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Given a graph, how can we quantify similarity between two nodes in an effective and scalable way? SimRank is an attractive measure of pairwise similarity based on graph topologies. Its underpinning philosophy that “two nodes are similar if they are pointed to (have incoming edges) from similar nodes” can be regarded as an aggregation of similarities based on incoming paths. Despite its popularity in various applications (e.g., web search and social networks), SimRank has an undesirable trait, i.e., “zero-similarity”: it accommodates only the paths of equal length from a common “center” node, whereas a large portion of other paths are fully ignored. In this paper, we propose an effective and scalable similarity model, SimRank, to remedy this problem. (1) We first provide a sufficient and necessary condition of the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, Li et al. ’s SimRank model, Random Walk with Restart (RWR), and ASCOS++. (2) We next present our treatment, SimRank, which can resolve this issue while inheriting the merit of the simple SimRank philosophy. (3) We reduce the series form of SimRank to a closed form, which looks simpler than SimRank but which enriches semantics without suffering from increased computational overhead. This leads to an iterative form of SimRank, which requires O(Knm) time and O(n2) memory for computing all (n2) pairs of similarities on a graph of n nodes and m edges for K iterations. (4) To improve the computational time of SimRank further, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient heuristic to speed up all-pairs SimRank computation to O(Knm~) time, where m~ is generally much smaller than m. (5) To scale SimRank on billion-edge graphs, we propose two memory-efficient single-source algorithms, i.e., ss-gSR for geometric SimRank, and ss-eSR for exp
AU - Yu,W
AU - Lin,X
AU - Zhang,W
AU - Pei,J
AU - McCann,JA
DO - 10.1007/s00778-018-0536-3
EP - 426
PY - 2019///
SN - 1066-8888
SP - 401
TI - SimRank: effective and scalable pairwise similarity search based on graph topology
T2 - VLDB Journal
UR - http://dx.doi.org/10.1007/s00778-018-0536-3
UR - http://hdl.handle.net/10044/1/67836
VL - 28
ER -