Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Watermarking for language models: Embedding signals into generated text that are invisible to humans but algorithmically detectable.
2.- Green/red list: Randomly partitioning vocabulary into "green" (allowed) and "red" (discouraged) tokens for each generation step.
3.- Soft watermarking: Adding a constant δ to logits of green list tokens, adaptively enforcing watermark based on text entropy.
4.- Detection via z-statistic: Using proportion of green tokens to detect watermark with interpretable p-values.
5.- Spike entropy: Measure of distribution spread, useful for analyzing watermark strength.
6.- Watermark strength vs text quality tradeoff: Stronger watermarks may distort generated text.
7.- Beam search synergy: Using beam search amplifies watermark while maintaining text quality.
8.- Public/private watermarking: Allows transparency and independent verification while maintaining stronger private detection.
9.- Watermark robustness: Difficult to remove without significantly modifying text or degrading quality.
10.- Low entropy challenges: Watermark less effective on highly deterministic text sequences.
11.- Multiple watermarks: Applying several watermarks simultaneously for flexibility and stronger detection.
12.- Selective watermarking: Activating watermark in response to suspicious API usage.
13.- Paraphrasing attacks: Attempts to remove watermark through manual or automated rephrasing.
14.- Tokenization attacks: Modifying text to change sub-word tokenization and impact hash computation.
15.- Homoglyph attacks: Using unicode characters that look identical to change tokenization.
16.- Generative attacks: Prompting model to change output in predictable, reversible ways (e.g. emoji insertion).
17.- Canonicalization: Normalizing text before watermark testing to defend against certain attacks.
18.- Impact on factuality: Soft watermarking has minimal effect on model's factual accuracy.
19.- Watermark discovery: Difficulty of detecting watermark presence solely through text analysis.
20.- Perplexity impact: Theoretical bound on how watermarking affects model perplexity.
21.- Private mode: Using secret random key for watermarking, hosted behind secure API.
22.- False positive/negative tradeoffs: Balancing watermark detection accuracy and error rates.
23.- Watermark parameters: Effects of green list size (γ) and logit boost (δ) on watermark strength.
24.- Sequence length impact: Longer sequences allow for stronger watermark detection.
25.- Entropy-based adaptation: Watermark strength varies based on text predictability.
26.- API cost considerations: Some attacks increase token usage, raising costs for attackers.
27.- Negative example training: Potential defense against certain attacks through model fine-tuning.
28.- Repeated n-gram handling: Ignoring repeated phrases to improve watermark sensitivity.
29.- Oracle model evaluation: Using larger model to assess perplexity of watermarked text.
30.- Theoretical analysis: Mathematical framework for understanding watermark behavior and detection sensitivity.
Knowledge Vault built byDavid Vivancos 2024