Google developed Snappy algorithm in c : https://code.google.com/p/snappy/ which is ported in java by various third party. For more information, please see http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/.
Hadoop shipped with https://code.google.com/p/hadoop-snappy/ which is missing in your hadoop client classpath. In my apache hadoop 2.0.5-alpha, the SnappyCodec class is in hadoop-common-2.0.5-alpha.jar
[hyan@ronnie hunk-staging]$ jar -J-Xmx512m -tf /mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/common/hadoop-common-2.0.5-alpha.jar |grep Snappy
org/apache/hadoop/io/compress/snappy/SnappyDecompressor.class
org/apache/hadoop/io/compress/snappy/SnappyCompressor.class
org/apache/hadoop/io/compress/SnappyCodec.class
Here is my hadoop classpath:
[hyan@ronnie hunk-staging]$ /mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/bin/hadoop classpath | sed 's/:/\n/g'
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/etc/hadoop
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/common/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/common/*
/contrib/capacity-scheduler/*.jar
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/hdfs
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/hdfs/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/hdfs/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/yarn/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/yarn/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/mapreduce/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/mapreduce/*
The snappy java library uses the native c library, therefore, those native libraries need to be on native library path (set by 'java.library.path' java system property). This system property is set by one of the hadoop config files before calling bin/hadoop script and could be reset by user. The default location seems to be different between various hadoop version/distro. For hadoop 1.0.3, as Ledion shown, it is under $HADOOP_HOME/lib/native/ /. For hadoop 2.0.5, it is under $HADOOP_HOME/lib/native/. You can reset the path in hunk by adding the following line to indexes.conf:
vix.env.HADOOP_CLIENT_OPTS = -Djava.library.path=/mnt/big/hyan/local/lib/
Note, if you installed your hadoop client from a hadoop tar file on your search head, the native hadoop library ($HADOOP_HOME/lib/native/libhadoop.so) most likely does not work with snappy (because it is an optional component). You may need to rebuild the hadoop native library together with snappy component.
Here is my native lib directory:
[hyan@ronnie hunk-staging]$ ls -l /mnt/big/hyan/local/lib/
total 66028
-rw-r--r-- 1 hyan games 124788 Oct 3 16:42 libcontainer.a
-rw-r--r-- 1 hyan games 753980 Oct 3 16:42 libhadoop.a
-rw-r--r-- 1 hyan games 1482396 Oct 3 16:42 libhadooppipes.a
-rwxr-xr-x 1 hyan games 412640 Oct 3 16:42 libhadoop.so
-rwxr-xr-x 1 hyan games 412640 Oct 3 16:42 libhadoop.so.1.0.0
-rw-r--r-- 1 hyan games 579600 Oct 3 16:42 libhadooputils.a
-rw-r--r-- 1 hyan games 268948 Oct 3 16:42 libhdfs.a
-rwxr-xr-x 1 hyan games 177913 Oct 3 16:42 libhdfs.so
-rwxr-xr-x 1 hyan games 177913 Oct 3 16:42 libhdfs.so.0.0.0
-rw-r--r-- 1 hyan games 44066 Oct 3 16:42 libnative_mini_dfs.a
-rw-r--r-- 1 hyan games 17230 Oct 3 16:42 libposix_util.a
-rw-r--r-- 1 hyan games 17648118 Oct 2 15:34 libprotobuf.a
-rwxr-xr-x 1 hyan games 1003 Oct 2 15:34 libprotobuf.la
-rw-r--r-- 1 hyan games 1947834 Oct 2 15:34 libprotobuf-lite.a
-rwxr-xr-x 1 hyan games 1038 Oct 2 15:34 libprotobuf-lite.la
lrwxrwxrwx 1 hyan games 25 Oct 2 15:34 libprotobuf-lite.so -> libprotobuf-lite.so.7.0.0
lrwxrwxrwx 1 hyan games 25 Oct 2 15:34 libprotobuf-lite.so.7 -> libprotobuf-lite.so.7.0.0
-rwxr-xr-x 1 hyan games 893019 Oct 2 15:34 libprotobuf-lite.so.7.0.0
lrwxrwxrwx 1 hyan games 20 Oct 2 15:34 libprotobuf.so -> libprotobuf.so.7.0.0
lrwxrwxrwx 1 hyan games 20 Oct 2 15:34 libprotobuf.so.7 -> libprotobuf.so.7.0.0
-rwxr-xr-x 1 hyan games 7324725 Oct 2 15:34 libprotobuf.so.7.0.0
-rw-r--r-- 1 hyan games 25882136 Oct 2 15:34 libprotoc.a
-rwxr-xr-x 1 hyan games 1028 Oct 2 15:34 libprotoc.la
lrwxrwxrwx 1 hyan games 18 Oct 2 15:34 libprotoc.so -> libprotoc.so.7.0.0
lrwxrwxrwx 1 hyan games 18 Oct 2 15:34 libprotoc.so.7 -> libprotoc.so.7.0.0
-rwxr-xr-x 1 hyan games 9071244 Oct 2 15:34 libprotoc.so.7.0.0
-rw-r--r-- 1 hyan games 207892 Oct 3 16:42 libsnappy.a
-rwxr-xr-x 1 hyan games 962 Oct 3 16:42 libsnappy.la
lrwxrwxrwx 1 hyan games 18 Oct 2 15:42 libsnappy.so -> libsnappy.so.1.1.4
lrwxrwxrwx 1 hyan games 18 Oct 2 15:42 libsnappy.so.1 -> libsnappy.so.1.1.4
-rwxr-xr-x 1 hyan games 128527 Oct 3 16:42 libsnappy.so.1.1.4
drwxr-xr-x 2 hyan games 4096 Oct 2 15:34 pkgconfig
you can verify if snappy already works with your hadoop client by:
$HADOOP_HOME/bin/hadoop fs -text s3n://AKIAI5CIAYX6LNOC6XMQ:4PqCDiaAdPoOR2puiq9cjHv7PtxoMF3cIW0GRsqO@wp-dw-source/omni/site=washpostcom/dt=20090616/a088a0a4-3a18-4dae-a551-ecf25a855367_000389.snappy
... View more