Results: 87.3% F1 for failure-indicating log identification, 80% top-5 accuracy for faulty node detection
Not directly applicable to LAPP Phase 1 (log parsing), but the cross-job filtering idea (comparing against known-good baselines) is relevant for Phase 2 anomaly detection
Details
Platform-X: Huawei production AI platform, avg job size 72.8B params, avg 941 accelerators