This book lays out the theoretical groundwork for personalized search and reputation management, both on the Web and in peer-to-peer and social networks. Representing much of the foundational research in this field, the book develops scalable algorithms that exploit the graphlike properties underlying personalized search and reputation management, and delves into realistic scenarios regarding Web-scale data.
Sep Kamvar focuses on eigenvector-based techniques in Web search, introducing a personalized variant of Google's PageRank algorithm, and he outlines algorithms--such as the now-famous quadratic extrapolation technique--that speed up computation, making personalized PageRank feasible. Kamvar suggests that Power Method-related techniques ultimately should be the basis for improving the PageRank algorithm, and he presents algorithms that exploit the convergence behavior of individual components of the PageRank vector. Kamvar then extends the ideas of reputation management and personalized search to distributed networks like peer-to-peer and social networks. He highlights locality and computational considerations related to the structure of the network, and considers such unique issues as malicious peers. He describes the EigenTrust algorithm and applies various PageRank concepts to P2P settings. Discussion chapters summarizing results conclude the book's two main sections.
Clear and thorough, this book provides an authoritative look at central innovations in search for all of those interested in the subject. PART I: WORLD WIDE WEB 5 Chapter 3: The Second Eigenvalue of the Google Matrix 15 Chapter 4: The Condition Number of the PageRank Problem 20 Chapter 5: Extrapolation Algorithms 23 Chapter 6: Adaptive PageRank 42 Chapter 7: BlockRank 51 PART II: P2P NETWORKS 73 Chapter 9: EigenTrust 84 Chapter 10: Adaptive P2P Topologies 108
Figures xi
Acknowledgments xv
Chapter 1: Introduction 1
1.1 World Wide Web 1
1.2 P2P Networks 2
1.3 Contributions 2
Chapter 2: PageRank 7
2.1 PageRank Basics 7
2.2 Notation and Mathematical Preliminaries 9
2.3 Power Method 10
2.3.1 Formulation 10
2.3.2 Operation Count 12
2.3.3 Convergence 12
2.4 Experimental Setup 13
2.5 Related Work 13
2.5.1 Fast Eigenvector Computation 13
2.5.2 PageRank 14
3.1 Introduction 15
3.2 Theorems 15
3.3 Proof of Theorem 1 15
3.4 Proof of Theorem 2 17
3.5 Implications 18
3.6 Theorems Used 19
4.1 Theorem 6 20
4.2 Proof of Theorem 6 20
4.3 Implications 21
5.1 Introduction 23
5.2 Aitken Extrapolation 23
5.2.1 Formulation 23
5.2.2 Operation Count 25
5.2.3 Experimental Results 26
5.2.4 Discussion 26
5.3 Quadratic Extrapolation 27
5.3.1 Formulation 27
5.3.2 Operation Count 30
5.3.3 Experimental Results 30
5.3.4 Discussion 34
5.4 Power Extrapolation 35
5.4.1 Simple Power Extrapolation 35
5.4.2 A2 Extrapolation 35
5.4.3 Ad Extrapolation 37
5.5 Measures of Convergence 40
6.1 Introduction 42
6.2 Distribution of Convergence Rates 42
6.3 Adaptive PageRank Algorithm 44
6.3.1 Algorithm Intuition 45
6.3.2 Filter-based Adaptive PageRank 46
6.4 Experimental Results 48
6.5 Extensions 48
6.5.1 Further Reducing Redundant Computation 48
6.5.2 Using the Matrix Ordering from the Previous Computation 50
6.6 Discussion 50
7.1 Block Structure of the Web 51
7.1.1 Block Sizes 54
7.1.2 The GeoCities Effect 55
7.2 BlockRank Algorithm 55
7.2.1 Overview of BlockRank Algorithm 56
7.2.2 Computing Local PageRanks 57
7.2.3 Estimating the Relative Importance of Each Block 60
7.2.4 Approximating Global PageRank Using Local PageRank and BlockRank 61
7.2.5 Using This Estimate as a Start Vector 62
7.3 Advantages of BlockRank 63
7.4 Experimental Results 64
7.5 Discussion 67
7.6 Personalized PageRank 67
7.6.1 Inducing Random Jump Probabilities over Pages 68
7.6.2 Using "Better" Local PageRanks 68
7.6.3 Experiments 69
7.6.4 Topic-Sensitive PageRank 70
7.6.5 Pure BlockRank 71
Chapter 8: Query-Cycle Simulator 75
8.1 Challenges in Empirical Evaluation of P2P Algorithms 75
8.2 The Query-Cycle Model 75
8.3 Basic Properties 76
8.3.1 Network Topology 76
8.3.2 Joining the Network 76
8.3.3 Query Propagation 76
8.4 Peer-Level Properties 77
8.5 Content Distribution Model 78
8.5.1 Data Volume 78
8.5.2 Content Type 78
8.6 Peer Behavior Model 80
8.6.1 Uptime and Session Duration 80
8.6.2 Query Activity 81
8.6.3 Queries 81
8.6.4 Query Responses 81
8.6.5 Downloads 82
8.7 Network Parameters 82
8.7.1 Topology 82
8.7.2 Bandwidth 82
8.8 Discussion 83
9.1 Design Considerations 84
9.2 Reputation Systems 85
9.3 EigenTrust 86
9.3.1 Normalizing Local Trust Values 86
9.3.2 Aggregating Local Trust Values 87
9.3.3 Probabilistic Interpretation 87
9.3.4 Basic EigenTrust 87
9.3.5 Practical Issues 88
9.3.6 Distributed EigenTrust 89
9.3.7 Algorithm Complexity 90
9.4 Secure EigenTrust 91
9.4.1 Algorithm Description 92
9.4.2 Discussion 93
9.5 Using Global Trust Values 94
9.6 Experiments 95
9.6.1 Load Distribution in a Trust-based Network 95
9.6.2 Threat Models 98
9.7 Related Work 106
9.8 Discussion 106
10.1 Introduction 108
10.2 Interaction Topologies 109
10.3 Adaptive P2P Topologies 109
10.3.1 Local Trust Scores 109
10.3.2 Protocol 110
10.3.3 Practical Issues 112
10.4 Empirical Results 115
10.4.1 Malicious Peers Move to Fringe 115
10.4.2 Freeriders Move to Fringe 118
10.4.3 Active Peers Are Rewarded 119
10.4.4 Efficient Topology 120
10.5 Threat Scenarios 126
10.5.1 Threat Model A 126
10.5.2 Threat Model B 128
10.5.3 Threat Model C 130
10.6 Related Work 131
10.7 Discussion 132
Chapter 11: Conclusion 133
Bibliography 135