Loghub: Large Collection of System Log Datasets

Takeaway

  • The original loghub: 19 real-world log datasets (distributed systems, supercomputers, OS, mobile, server apps, standalone)
  • 90K+ downloads, used by hundreds of orgs — the de facto standard benchmark for log analysis research
  • This is the v1; Loghub-2.0 (ref 03) expanded it to 14 larger datasets with avg 3.6M lines each
  • For LAPP: use Loghub-2.0 for benchmarking, but loghub v1 (2K samples per dataset) is handy for quick smoke tests during development