Logzip: Log Compression via Iterative Clustering

Takeaway

  • Log-specific compression: extract templates via clustering, store template + variables separately, then compress — saves ~50% over gzip
  • Interesting side effect: the template extraction step is basically log parsing, just used for compression instead of analysis
  • For LAPP: not directly relevant, but confirms that template extraction (parsing) is the foundation for everything — even compression benefits from it