awesome-LLM-AIOps - Curated Paper List

Takeaway

Maintained by the CUHK group (same authors as L4, LogPilot, iKnow) — authoritative list
Organized by: Incident Management (RCA, mitigation, postmortem), Log Analysis (parsing, anomaly detection, log generation), Infrastructure Management
Notable papers we havent covered that could be relevant:
- AIOpsLab (MLSys’25): benchmark framework for evaluating AI agents in cloud ops — could be useful for LAPP evaluation
- RCAgent (CIKM’24): tool-augmented LLM agent for cloud RCA — the “agent with tools” pattern is what LAPP Phase 2 could evolve into
- D-Bot (VLDB’24): database diagnosis with Tree-of-Thought prompting — structured reasoning for diagnosis
- COCA (ICSE’25): RCA using code knowledge — ties back to LogImprover/SCLogger idea of connecting logs to source code
Log Parsing section lists: LILAC, LogBatcher, DivLog, LogParser-LLM — confirms our reading list covers the key ones
Log Anomaly Detection section is a potential goldmine for LAPP Phase 2, but not priority now
For LAPP: good reference to check periodically for new papers. The “agent for RCA” trend (RCAgent, FLASH, LLexus) suggests LAPP Phase 2 could be an agentic system that uses tools (log parser, metrics fetcher, trace viewer) to do RCA