RE: What are the supported compression codecs in Avro in diyotta?
The following are the used compression codecs used in Avro.
- The “deflate” codec writes the data block using the deflate algorithm as specified in RFC 1951, and typically implemented using the zlib library. Note that this format (unlike the “zlib format” in RFC 1950) does not have a checksum.
- with specific compression. Compression Level should be between 1 and 9.
- The “snappy” codec uses Google’s Snappy compression library. Each compressed block is followed by the 4-byte, big-endian CRC32 checksum of the uncompressed data in the block.
- The Snappy codec from Google provides modest compression ratios, but fast compression and decompression speeds. (In fact, it has the fastest decompression speeds, which makes it highly desirable for data sets that are likely to be queried often.)
- The Snappy codec is integrated into Hadoop Common, a set of common utilities that supports other Hadoop subprojects. You can use Snappy as an add-on for more recent versions of Hadoop that do not yet provide Snappy codec support.
- From a usability standpoint, Bzip2 and Gzip are similar. Bzip2 generates a better compression ratio than does Gzip, but it’s much slower. In fact, of all the available compression codecs in Hadoop, Bzip2 is by far the slowest.
- If you’re setting up an archive that you’ll rarely need to query and space is at a high premium, then maybe would Bzip2 be worth considering.
- xz is a lossless compression program and file format which incorporates the LZMA/LZMA2 compression algorithms.