Loghub: Large Collection of System Log Datasets
Takeaway
- The original loghub: 19 real-world log datasets (distributed systems, supercomputers, OS, mobile, server apps, standalone)
- 90K+ downloads, used by hundreds of orgs — the de facto standard benchmark for log analysis research
- This is the v1; Loghub-2.0 (ref 03) expanded it to 14 larger datasets with avg 3.6M lines each
- For LAPP: use Loghub-2.0 for benchmarking, but loghub v1 (2K samples per dataset) is handy for quick smoke tests during development