Splunk Search
Highlighted

hunk search-time field extraction not working

Builder

Hunk v6.2.2 to hortonworks hadoop v2.2.4.2. My search-time field extraction for clienthost is not consistent. It will return too few results or none at all. For example, if I search "index=hadoop clienthost=10.0.0.10" in the last 4 hours (at 4pm eastern time) I get no results. Can someone help troubleshoot?

Raw logs in /myprovider/syslogs/2015/2015-06-10_datacollector2.txt contain:

2015-06-10T20:13:33Z syslog.tcp {"message":"<14>Jun 10 16:07:03 WIN-VQCJADNQOGL MSWinEventLog\t1\tMicrosoft-Windows-LanguagePackSetup/Operational\t71\tWed Jun 10 16:07:03 2015\t4001\tMicrosoft-Windows-LanguagePackSetup\tSYSTEM\tUser\tInformation\tWIN-VQCJADNQOGL\tLanguage Pack cleanup functionality\t\tLPRemove terminating.\t16\r","clienthost":"10.0.0.10"}
2015-06-10T20:13:33Z syslog.tcp {"message":"<14>Jun 10 16:07:03 WIN-VQCJADNQOGL MSWinEventLog\t1\tMicrosoft-Windows-MUI/Operational\t72\tWed Jun 10 16:07:03 2015\t3003\tMicrosoft-Windows-MUI\tSYSTEM\tUser\tInformation\tWIN-VQCJADNQOGL\tMUI resource cache builder\t\tMUI resource cache builder has been called with the following parameters: (null).\t29\r","client
host":"10.0.0.10"}
2015-06-10T20:13:45Z syslog.tcp {"message":"<14>Jun 10 16:07:13 WIN-VQCJADNQOGL MSWinEventLog\t1\tMicrosoft-Windows-MUI/Operational\t73\tWed Jun 10 16:07:13 2015\t3007\tMicrosoft-Windows-MUI\tSYSTEM\tUser\tInformation\tWIN-VQCJADNQOGL\tMUI resource cache builder\t\tNew resource cache built and installed on system. New cache index is 5, live cache index is 5 and config is set to 3.\t30\r","client_host":"10.0.0.10"}

My Hunk config:

index.conf

[provider:myprovider]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /usr/hdp/2.2.4.2-2/hadoop
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-openjdk-amd64
vix.family = hadoop
vix.fs.default.name = hdfs://hadoop-namenode1.internal:8020
vix.mapreduce.framework.name = yarn
vix.mapred.child.java.opts = -server -Xmx1024m
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /user/splunk/myprovider
vix.yarn.resourcemanager.address = hadoop-namenode2.internal:8050
vix.yarn.resourcemanager.scheduler.address = hadoop-namenode2.internal:8030
vix.yarn.application.classpath = /etc/hadoop/conf,/usr/hdp/2.2.4.2-2/hadoop/client/*,/usr/hdp/2.2.4.2-2/hadoop/lib/*,/usr/hdp/2.2.4.2-2/hadoop-hdfs/*,/usr/hdp/2.2.4.2-2/hadoop-hdfs/lib/*,/usr/hdp/2.2.4.2-2/hadoop-yarn/*,/usr/hdp/2.2.4.2-2/hadoop-yarn/lib/*
vix.splunk.home.datanode = /user/splunk/splunk-search1/
vix.splunk.setup.package = /opt/hunk/hunk-6.2.2-257696-linux-2.6-x86_64.tgz

[hadoop]
vix.input.1.path = /myprovider/syslogs/...
vix.provider = myprovider
vix.input.1.accept = \.txt$
vix.input.1.et.format = yyyyMMdd
vix.input.1.et.offset = 3600
vix.input.1.et.regex = /myprovider/syslogs/(\d+)/\d+-(\d+)-(\d+)_\w+\.txt
vix.input.1.lt.format = yyyyMMdd
vix.input.1.lt.offset = 86400
vix.input.1.lt.regex = /myprovider/syslogs/(\d+)/\d+-(\d+)-(\d+)_\w+\.txt

props.conf

[source::/myprovider/syslogs/*/*]
EXTRACT-clienthost = client_host\"\:\"(?<client_host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\"

sourcetype = hadoop
priority = 100
ANNOTATE_PUNCT = false
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ
TZ=UTC
Highlighted

Re: hunk search-time field extraction not working

Builder

Looks like this page formatted the escape characters. Here's my original question: http://pastebin.ca/3023980

0 Karma
Highlighted

Re: hunk search-time field extraction not working

Splunk Employee
Splunk Employee

Is your data being sourcetyped correctly? i.e.: does the sourcetype field return a value of hadoop for these events? If so I would add a field extraction definition to the hadoop sourcetype stanza in props.conf on your search head:

$SPLUNKHOME/etc/system/local/props.conf
(or props.conf in the app of your choice) $SPLUNK
HOME/etc/apps/appofyourchoice/local/props.conf

[hadoop]

EXTRACT-client_host = (?m)client_host":"(?<client_host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"}

then restart splunk on your SH:
$SPLUNK_HOME/bin
./splunk restart

to validate, you can run a search like this:
index=* sourcetype=hadoop | stats count by client_host

If not and you want to do the extraction on the source field, this should work:

on your search head:
$SPLUNKHOME/etc/system/local/props.conf
(or props.conf in the app of your choice) $SPLUNK
HOME/etc/apps/appofyourchoice/local/props.conf

[source::/myprovider/syslogs/...]

EXTRACT-client_host = (?m)client_host":"(?<client_host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"}
Highlighted

Re: hunk search-time field extraction not working

Splunk Employee
Splunk Employee

Can you try a) replacing the stanza name and more importantly b) remove the unnecessary slashes from " in the extraction regex? If that works, given that the data seems partially like json I'd recommend that you add into the regex optional spaces between : and "

[source::/myprovider/syslogs/...]
EXTRACT-clienthost = client_host":"(?<client_host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

View solution in original post

Highlighted

Re: hunk search-time field extraction not working

Splunk Employee
Splunk Employee

Regex and stanza are shown correctly (ie no format messup)

0 Karma
Highlighted

Re: hunk search-time field extraction not working

Builder

Thanks! Looks like my regex was off.

0 Karma