Anomaly detection on metrics (not logs) — detects performance degradation like slow response
Key idea: cluster anomalous metric patterns into “sketches”, so when a similar pattern appears again you immediately know what type of issue it is
Interpretable: instead of just “anomaly detected”, tells you which pattern group it matches — engineers can map patterns to known root causes
Adaptive: online learning algorithm discovers new patterns as services evolve, no need to retrain from scratch
Deployed at Huawei Cloud in production
For LAPP: the “pattern sketching” concept could apply to log anomaly patterns in Phase 2 — cluster anomalous log sequences into recognizable patterns, build a library of known failure signatures