Mining Large Graphs: Patterns, Anomalies, and Fraud Detection

Christos Faloutsos

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef main fill:#f9d4d4, font-weight:bold, font-size:14px
classDef intro fill:#d4f9d4, font-weight:bold, font-size:14px
classDef patterns fill:#d4d4f9, font-weight:bold, font-size:14px
classDef methods fill:#f9f9d4, font-weight:bold, font-size:14px
classDef applications fill:#f9d4f9, font-weight:bold, font-size:14px
classDef future fill:#d4f9f9, font-weight:bold, font-size:14px
Main[Mining Large Graphs:

Patterns, Anomalies, and

Fraud Detection] Main --> A[Introduction to Graph Mining] A --> A1[Faloutsos: data mining, graphs,

networks researcher 1] A --> A2[Graphs prevalent in various domains 2] A --> A3[Patterns and anomalies interrelated

in graphs 3] A --> A4[Real graphs exhibit non-random

patterns 4] A --> A5[Average degree smaller than

maximum degree 5] A --> A6[Skewed distributions impact computational

resources 6] Main --> B[Graph Patterns and Properties] B --> B1[Eigenvalues follow power law

distribution 7] B --> B2[Triangle participation grows with

node degree 8] B --> B3[Eigenvalues approximate total triangle

count 9] B --> B4[Real graphs follow power

law patterns 13] B --> B5[Pattern awareness enables fast

computations 14] B --> B6[Biological networks exhibit many

cliques 24] Main --> C[Methods and Techniques] C --> C1[Belief propagation uncovers network

fraud 11] C --> C2[Graph algorithms relate via

matrix equations 12] C --> C3[Time-evolving graphs represented as

tensors 16] C --> C4[Tensor decomposition uncovers latent

components 17] C --> C5[Scalable methods analyze large

tensors 19] C --> C6[Tensor analysis shows coordinated

activity 21] Main --> D[Applications and Discoveries] D --> D1[Twitter mining revealed suspicious

nodes 10] D --> D2[Decomposition revealed unusual calling

pattern 18] D --> D3[Big data reveals previously

hidden patterns 20] D --> D4[Tensor methods may detect

insurance fraud 25] D --> D5[Real-world applications leverage these

techniques 28] D --> D6[Graph mining impacts various

applications 30] Main --> E[Challenges and Future Directions] E --> E1[Anomaly detection requires evolving

tools 15] E --> E2[Fraudsters adapt to detection

methods 22] E --> E3[Skewed distributions challenge anomaly

detection 23] E --> E4[Cross-disciplinarity key to graph

mining 26] E --> E5[Overlapping communities remain challenging 27] E --> E6[Open problems in graph mining 29] class Main main class A,A1,A2,A3,A4,A5,A6 intro class B,B1,B2,B3,B4,B5,B6 patterns class C,C1,C2,C3,C4,C5,C6 methods class D,D1,D2,D3,D4,D5,D6 applications class E,E1,E2,E3,E4,E5,E6 future

Patterns, Anomalies, and

Fraud Detection] Main --> A[Introduction to Graph Mining] A --> A1[Faloutsos: data mining, graphs,

networks researcher 1] A --> A2[Graphs prevalent in various domains 2] A --> A3[Patterns and anomalies interrelated

in graphs 3] A --> A4[Real graphs exhibit non-random

patterns 4] A --> A5[Average degree smaller than

maximum degree 5] A --> A6[Skewed distributions impact computational

resources 6] Main --> B[Graph Patterns and Properties] B --> B1[Eigenvalues follow power law

distribution 7] B --> B2[Triangle participation grows with

node degree 8] B --> B3[Eigenvalues approximate total triangle

count 9] B --> B4[Real graphs follow power

law patterns 13] B --> B5[Pattern awareness enables fast

computations 14] B --> B6[Biological networks exhibit many

cliques 24] Main --> C[Methods and Techniques] C --> C1[Belief propagation uncovers network

fraud 11] C --> C2[Graph algorithms relate via

matrix equations 12] C --> C3[Time-evolving graphs represented as

tensors 16] C --> C4[Tensor decomposition uncovers latent

components 17] C --> C5[Scalable methods analyze large

tensors 19] C --> C6[Tensor analysis shows coordinated

activity 21] Main --> D[Applications and Discoveries] D --> D1[Twitter mining revealed suspicious

nodes 10] D --> D2[Decomposition revealed unusual calling

pattern 18] D --> D3[Big data reveals previously

hidden patterns 20] D --> D4[Tensor methods may detect

insurance fraud 25] D --> D5[Real-world applications leverage these

techniques 28] D --> D6[Graph mining impacts various

applications 30] Main --> E[Challenges and Future Directions] E --> E1[Anomaly detection requires evolving

tools 15] E --> E2[Fraudsters adapt to detection

methods 22] E --> E3[Skewed distributions challenge anomaly

detection 23] E --> E4[Cross-disciplinarity key to graph

mining 26] E --> E5[Overlapping communities remain challenging 27] E --> E6[Open problems in graph mining 29] class Main main class A,A1,A2,A3,A4,A5,A6 intro class B,B1,B2,B3,B4,B5,B6 patterns class C,C1,C2,C3,C4,C5,C6 methods class D,D1,D2,D3,D4,D5,D6 applications class E,E1,E2,E3,E4,E5,E6 future

**Resume: **

**1.-** Christos Faloutsos' research focuses on data mining and databases, including mining large graphs, streams, networks, fractals and multimedia databases.

**2.-** Graphs are prevalent in many domains including the web, social networks, computer networks, food webs, blog networks, computer security, and recommendation systems.

**3.-** Patterns and anomalies in graphs go hand-in-hand - noticing a pattern allows you to identify points that don't follow it as anomalies.

**4.-** Real-world graphs are not random and exhibit many patterns such as power-law degree distributions, skewed eigenvalue distributions, and many triangles.

**5.-** The average degree is often much smaller than the maximum degree in real graphs, unlike what a Gaussian distribution would suggest.

**6.-** Failing to account for skewed degree distributions can lead to drastic underestimation of computational resources needed, e.g. for friend-of-friend calculations.

**7.-** Eigenvalues of real graphs also follow a power law distribution, with the top few eigenvalues being much larger than the rest.

**8.-** In social networks, the number of triangles a node participates in grows as the 1.5 power of the node's degree.

**9.-** The skewed eigenvalue distribution can be leveraged to quickly approximate the total number of triangles in a graph to 99% accuracy.

**10.-** Mining Twitter revealed suspicious nodes with low degree but participating in an unusually high number of triangles, which were adult advertisers.

**11.-** Belief propagation and careful compatibility matrix design can uncover fraud in graphs like eBay's buyer-seller network.

**12.-** Fast belief propagation, random walks with restarts and semi-supervised learning on graphs are all related via similar matrix equations.

**13.-** There are many patterns in real-world graphs, and most follow power laws. Ignoring this risks falling into the "Gaussian trap."

**14.-** Noticing patterns enables you to do very fast computations. Ignoring them can require petabytes of storage for basic operations.

**15.-** Anomaly detection will never have a final answer and requires a growing list of tools. Patterns and anomalies go hand-in-hand.

**16.-** Time-evolving graphs can be represented as tensors, with caller-receiver, author-keyword-date, or subject-verb-object as typical modes.

**17.-** Tensor decomposition is an analog of matrix SVD and can uncover meaningful latent components in multi-mode tensors.

**18.-** Tensor decomposition of a who-calls-whom network over time uncovered an unusual "godfather calling subordinates" pattern.

**19.-** Scalable tensor decomposition methods like GigaTensor and HAT-10-2 enable analysis of very large tensors on Hadoop.

**20.-** Big data helps find patterns that would be missed in small samples, like small clusters of adult advertisers on Twitter.

**21.-** Tensor analysis of time-evolving graphs can reveal subtle patterns like coordinated periodic activity among a small group of nodes.

**22.-** Fraudsters and anomaly detection are engaged in an arms race - smarter detection forces fraudsters to adopt more sophisticated strategies.

**23.-** Skewed value distributions, like power laws for transaction amounts, make probabilistic arguments for anomaly detection more challenging.

**24.-** Biological networks exhibit many cliques due to e.g. groups of proteins jointly participating in a chain of reactions.

**25.-** Insurance fraud can manifest as cliques of corrupt doctors/pharmacists filing fake claims for elderly patients. Tensor methods may spot this.

**26.-** Cross-disciplinarity, combining domain expertise, algorithms, and systems, is key to mining subtle patterns and anomalies from big graph data.

**27.-** Overlapping communities and subtle fraudulent patterns remain challenging and may require human-in-the-loop inspection of detected patterns.

**28.-** Real-world applications leverage these techniques, e.g. Twitter and Facebook for fraud detection, police software for crime analysis.

**29.-** Open problems include extending techniques to weighted graphs, addressing overlapping patterns, and proving game-theoretic optimality of detection.

**30.-** Graph and tensor mining power applications from online fraud to crime forensics, highlighting the broad potential impact of these techniques.

Knowledge Vault built byDavid Vivancos 2024