FRESH

Hacker News

Fc, a lossless compressor for floating-point streams

93 points by enduku

by pella

1 subcomments

> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."

The new OpenZL SDDL2 (Simple Data Description Language) supports several different floating-point types. It would be worthwhile to contribute some of the FC project's experience to OpenZL. Now the OpenZL supported types:

  | Type           | Size    |Endian|
  |----------------|---------|-----|
  | `Int8`         | 1 byte  | N/A |
  | `UInt8`        | 1 byte  | N/A |
  | `Int16LE/BE`   | 2 bytes | Yes |
  | `UInt16LE/BE`  | 2 bytes | Yes |
  | `Int32LE/BE`   | 4 bytes | Yes |
  | `UInt32LE/BE`  | 4 bytes | Yes |
  | `Int64LE/BE`   | 8 bytes | Yes |
  | `UInt64LE/BE`  | 8 bytes | Yes |
  | `Float16LE/BE` | 2 bytes | Yes |
  | `Float32LE/BE` | 4 bytes | Yes |
  | `Float64LE/BE` | 8 bytes | Yes |
  | `BFloat16LE/BE`| 2 bytes | Yes |
  | `Bytes(n)`     | n bytes | N/A |

Some links:

- https://github.com/facebook/openzl/releases/tag/v0.2.0

- https://openzl.org/getting-started/introduction/

- https://openzl.org/sddl/sddl2-announcement/

- https://openzl.org/sddl/core-concepts/

by radford-neal

0 subcomment

Those interested in this might find my paper on "Representing numeric data in 32 bits while preserving 64-bit precision" to be of interest. Can be found at https://arxiv.org/abs/1504.02914 (note the code available as auxilliary files). In the context of this compressor, it could be one of the compressors competing to compress a block. It works well for data converted from a decimal representation with a small number of digits.

by userbinator

2 subcomments

It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.
This is, for lack of a better term, a "metacompressor", but it will be interesting to see which of the choices end up dominating; in my past experiences with metacompression, one algorithm is usually consistently ahead.

by rincebrain

0 subcomment

I must say, for a library advertising handling of streams of data, the absence of a stream utility to [input] | fc | fc -d surprised me.
I understand this is more the primitive that you would build such a thing on top of, just that the first question I always have for novel compressors is "how do they do on these example streams of data".

by childintime

1 subcomments

A lossy compressor might also be useful for common floating point apps. The simplest compressor ever would just chop off a number of bits from the mantissa.

by loeg

1 subcomments

The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)

by Scaevolus

1 subcomments

I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?

by abcd_f

1 subcomments

The most interesting section - How It Works - could really elaborate on details a bit more.

by KerrickStaley

1 subcomments

Another library in this space is pcodec; I'd appreciate a comparison of the two.

by enduku

4 subcomments

I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.
It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.
It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.
The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.
Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.
Repo: https://github.com/xtellect/fc