LogPilot: Intent-aware and Scalable Alert Diagnosis
- Venue: ASE’25
- Authors: Zhihan Jiang, Jinyang Liu et al. (same group as L4)
- Paper: https://arxiv.org/abs/2509.25874
- Deployed at: Volcano Engine Cloud (ByteDance)
- Status: Read
Takeaway
- Alert diagnosis, not log parsing — given an alert (e.g., PromQL firing), auto-find root cause from logs
- Intent-aware: reads the alert definition (PromQL query) to understand what the alert cares about, then scopes which logs/requests are relevant
- Builds “spatiotemporal log chains” per request (trace-like reconstruction from logs), then clusters similar chains to find patterns
- Clustering keeps LLM input compact: send representative samples instead of all logs
- Results: 50% better root cause summaries, 55% better exact localization vs baselines
- Fast and cheap: under 1 min per alert, $0.074 per diagnosis
- For LAPP: the intent-aware scoping idea is interesting — if we know what the user cares about (alert definition), we can narrow down which logs to analyze instead of boiling the ocean