I'm working on pushing out Hadoop data roll for archived data to our index cluster. The buckets are rolling as expected and I have buckets in hadoop but I'm not able to search the archived indexes in Hadoop from Splunk. I'm not seeing anything in the Splunk logs, it just returns no events found. Am I missing something on my SH cluster? I've done this in the past on my test box but I can't for the life of me remember what I changed outside of the indexes.conf on the indexer. running a simple index=proxy_archive for all time returns nothing.
Same Indexes.conf on the indexers:
[proxy]
homePath = $SPLUNK_DB/$_index_name/db
coldPath = $SPLUNK_DB/$_index_name/colddb
thawedPath = $SPLUNK_DB/$_index_name/thaweddb
maxDataSize = auto_high_volume
maxTotalDataSizeMB = 3250000
repFactor = auto
disabled = false
[proxy_archive]
vix.output.buckets.from.indexes = proxy
vix.output.buckets.older.than = 7776000
vix.output.buckets.path = hdfs://ns/projects/csdc_splunk/cold/proxy_archive
vix.smart.search.cutoff_sec = 7775000
vix.provider = hadoop-lake2
[provider-family:hadoop]
vix.mode = report
vix.command = $SPLUNK_HOME/bin/jars/sudobash
vix.command.arg.1 = $HADOOP_HOME/bin/hadoop
vix.command.arg.2 = jar
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-h1.jar
vix.command.arg.4 = com.splunk.mr.SplunkMR
vix.env.MAPREDUCE_USER =
vix.env.HADOOP_HEAPSIZE = 512
vix.env.HADOOP_CLIENT_OPTS = -Xmx4096m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive/hive-exec-0.12.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive/hive-metastore-0.12.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive/hive-serde-0.12.0.jar
vix.mapred.job.reuse.jvm.num.tasks = 100
vix.mapred.child.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapred.reduce.tasks = 0
vix.mapred.job.map.memory.mb = 2048
vix.mapred.job.reduce.memory.mb = 512
vix.mapred.job.queue.name = default
vix.mapreduce.job.jvm.numtasks = 100
vix.mapreduce.map.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapreduce.reduce.java.opts = -server -Xmx512m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapreduce.job.reduces = 0
vix.mapreduce.map.memory.mb = 2048
vix.mapreduce.reduce.memory.mb = 512
vix.mapreduce.job.queuename = default
vix.splunk.search.column.filter = 1
vix.splunk.search.mixedmode = 1
vix.splunk.search.debug = 0
vix.splunk.search.mr.maxsplits = 10000
vix.splunk.search.mr.minsplits = 100
vix.splunk.search.mr.splits.multiplier = 10
vix.splunk.search.mr.poll = 2000
vix.splunk.search.recordreader = SplunkJournalRecordReader,ValueAvroRecordReader,SimpleCSVRecordReader,SequenceFileRecordReader
vix.splunk.search.recordreader.avro.regex = \.avro$
vix.splunk.search.recordreader.csv.regex = \.([tc]sv)(?:\.(?:gz|bz2|snappy))?$
vix.splunk.search.recordreader.sequence.regex = \.seq$
vix.splunk.home.datanode = /tmp/splunk/$SPLUNK_SERVER_NAME/
vix.splunk.heartbeat = 1
vix.splunk.heartbeat.threshold = 60
vix.splunk.heartbeat.interval = 1000
vix.splunk.setup.onsearch = 1
vix.splunk.setup.package = current
[provider:hadoop-lake2]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.env.HADOOP_HOME = /opt/hadoop/current
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr/openv/java/jre
vix.family = hadoop
vix.fs.default.name = hdfs://ns/projects/csdc_splunk
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /projects/csdc_splunk/workdir
vix.splunk.impersonation = 0
vix.yarn.resourcemanager.address = lake2-rm.hdp.deere.com:8050
vix.yarn.resourcemanager.scheduler.address = lake2-rm.hdp.deere.com:8030
vix.splunk.home.datanode = /tmp/splunk/$SPLUNK_SERVER_NAME/
vix.splunk.setup.package.setup.timelimit = 10000
Hadoop Client 2.6.0. We were using 2.7 but ran into issues with Splunk sending 2+GB packet sizes and were advised by Splunk to downgrade to 2.6 or lower until HDP corrected the issue.
... View more