Knowledge Vault 6 /12 - ICML 2016
Mining Large Graphs: Patterns, Anomalies, and Fraud Detection
Christos Faloutsos
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef intro fill:#d4f9d4, font-weight:bold, font-size:14px classDef patterns fill:#d4d4f9, font-weight:bold, font-size:14px classDef methods fill:#f9f9d4, font-weight:bold, font-size:14px classDef applications fill:#f9d4f9, font-weight:bold, font-size:14px classDef future fill:#d4f9f9, font-weight:bold, font-size:14px Main[Mining Large Graphs:
Patterns, Anomalies, and
Fraud Detection] Main --> A[Introduction to Graph Mining] A --> A1[Faloutsos: data mining, graphs,
networks researcher 1] A --> A2[Graphs prevalent in various domains 2] A --> A3[Patterns and anomalies interrelated
in graphs 3] A --> A4[Real graphs exhibit non-random
patterns 4] A --> A5[Average degree smaller than
maximum degree 5] A --> A6[Skewed distributions impact computational
resources 6] Main --> B[Graph Patterns and Properties] B --> B1[Eigenvalues follow power law
distribution 7] B --> B2[Triangle participation grows with
node degree 8] B --> B3[Eigenvalues approximate total triangle
count 9] B --> B4[Real graphs follow power
law patterns 13] B --> B5[Pattern awareness enables fast
computations 14] B --> B6[Biological networks exhibit many
cliques 24] Main --> C[Methods and Techniques] C --> C1[Belief propagation uncovers network
fraud 11] C --> C2[Graph algorithms relate via
matrix equations 12] C --> C3[Time-evolving graphs represented as
tensors 16] C --> C4[Tensor decomposition uncovers latent
components 17] C --> C5[Scalable methods analyze large
tensors 19] C --> C6[Tensor analysis shows coordinated
activity 21] Main --> D[Applications and Discoveries] D --> D1[Twitter mining revealed suspicious
nodes 10] D --> D2[Decomposition revealed unusual calling
pattern 18] D --> D3[Big data reveals previously
hidden patterns 20] D --> D4[Tensor methods may detect
insurance fraud 25] D --> D5[Real-world applications leverage these
techniques 28] D --> D6[Graph mining impacts various
applications 30] Main --> E[Challenges and Future Directions] E --> E1[Anomaly detection requires evolving
tools 15] E --> E2[Fraudsters adapt to detection
methods 22] E --> E3[Skewed distributions challenge anomaly
detection 23] E --> E4[Cross-disciplinarity key to graph
mining 26] E --> E5[Overlapping communities remain challenging 27] E --> E6[Open problems in graph mining 29] class Main main class A,A1,A2,A3,A4,A5,A6 intro class B,B1,B2,B3,B4,B5,B6 patterns class C,C1,C2,C3,C4,C5,C6 methods class D,D1,D2,D3,D4,D5,D6 applications class E,E1,E2,E3,E4,E5,E6 future

Resume:

1.- Christos Faloutsos' research focuses on data mining and databases, including mining large graphs, streams, networks, fractals and multimedia databases.

2.- Graphs are prevalent in many domains including the web, social networks, computer networks, food webs, blog networks, computer security, and recommendation systems.

3.- Patterns and anomalies in graphs go hand-in-hand - noticing a pattern allows you to identify points that don't follow it as anomalies.

4.- Real-world graphs are not random and exhibit many patterns such as power-law degree distributions, skewed eigenvalue distributions, and many triangles.

5.- The average degree is often much smaller than the maximum degree in real graphs, unlike what a Gaussian distribution would suggest.

6.- Failing to account for skewed degree distributions can lead to drastic underestimation of computational resources needed, e.g. for friend-of-friend calculations.

7.- Eigenvalues of real graphs also follow a power law distribution, with the top few eigenvalues being much larger than the rest.

8.- In social networks, the number of triangles a node participates in grows as the 1.5 power of the node's degree.

9.- The skewed eigenvalue distribution can be leveraged to quickly approximate the total number of triangles in a graph to 99% accuracy.

10.- Mining Twitter revealed suspicious nodes with low degree but participating in an unusually high number of triangles, which were adult advertisers.

11.- Belief propagation and careful compatibility matrix design can uncover fraud in graphs like eBay's buyer-seller network.

12.- Fast belief propagation, random walks with restarts and semi-supervised learning on graphs are all related via similar matrix equations.

13.- There are many patterns in real-world graphs, and most follow power laws. Ignoring this risks falling into the "Gaussian trap."

14.- Noticing patterns enables you to do very fast computations. Ignoring them can require petabytes of storage for basic operations.

15.- Anomaly detection will never have a final answer and requires a growing list of tools. Patterns and anomalies go hand-in-hand.

16.- Time-evolving graphs can be represented as tensors, with caller-receiver, author-keyword-date, or subject-verb-object as typical modes.

17.- Tensor decomposition is an analog of matrix SVD and can uncover meaningful latent components in multi-mode tensors.

18.- Tensor decomposition of a who-calls-whom network over time uncovered an unusual "godfather calling subordinates" pattern.

19.- Scalable tensor decomposition methods like GigaTensor and HAT-10-2 enable analysis of very large tensors on Hadoop.

20.- Big data helps find patterns that would be missed in small samples, like small clusters of adult advertisers on Twitter.

21.- Tensor analysis of time-evolving graphs can reveal subtle patterns like coordinated periodic activity among a small group of nodes.

22.- Fraudsters and anomaly detection are engaged in an arms race - smarter detection forces fraudsters to adopt more sophisticated strategies.

23.- Skewed value distributions, like power laws for transaction amounts, make probabilistic arguments for anomaly detection more challenging.

24.- Biological networks exhibit many cliques due to e.g. groups of proteins jointly participating in a chain of reactions.

25.- Insurance fraud can manifest as cliques of corrupt doctors/pharmacists filing fake claims for elderly patients. Tensor methods may spot this.

26.- Cross-disciplinarity, combining domain expertise, algorithms, and systems, is key to mining subtle patterns and anomalies from big graph data.

27.- Overlapping communities and subtle fraudulent patterns remain challenging and may require human-in-the-loop inspection of detected patterns.

28.- Real-world applications leverage these techniques, e.g. Twitter and Facebook for fraud detection, police software for crime analysis.

29.- Open problems include extending techniques to weighted graphs, addressing overlapping patterns, and proving game-theoretic optimality of detection.

30.- Graph and tensor mining power applications from online fraud to crime forensics, highlighting the broad potential impact of these techniques.

Knowledge Vault built byDavid Vivancos 2024