Getting Data In
Highlighted

Indexing hostname by segment issue

Contributor

v4.3.1 on sles linux
i have a source which is a file in a dynamic path and the source is configured to use segment #4 of the path to assign the hostname to the indexed event.

/logs/syslog/linux/.../log
the real path is /logs/syslog/linux/$HOSTNAME/$YEAR/$MONTH/log

i went to Search App, Dashboards & Views, Summary and i am looking at the Hosts list. weird, in the list are hosts with abbreviated weekday names "Mon" "Tue" "Wed" "Thu" "Fri" "Sat" "Sun" ??!! i dont have any hosts or paths with these names. its all the same real host in this case, one of my linux boxes. the dates of the events match the weekday, as example, the events for host=Tue has events dated 4/3/2012 and 3/27/2012, host=Mon has 4/2 and 3/26, etc.

so where/why is Splunk indexing events as host=Tues or host=Mon ??

Tags (3)
0 Karma
Highlighted

Re: Indexing hostname by segment issue

Builder

Please can you tell us the sourcetype and paste a couple of lines of the log.

I suspect it is similar enough to syslog that splunk is trying to extract the host field from the data but where syslog would normally have a host, your data has the weekday. If so, setting the sourcetype to something else should fix it.

Bob

Highlighted

Re: Indexing hostname by segment issue

Contributor

ok, the data is syslog data and it does have abbreviated weekday as a field, however, this is not the definition of assigned hostname by path segment for the data source, so perhaps a bug in Splunk?

Specify which segment of the source path to set as the Host field.
For example: 3 (sets to 'hostname' for the path /var/log/hostname/)

my syslog-ng data gets written as template("$DATE $TZ $WEEKDAY $ISODATE $HOST $FACILITY [$LEVEL] $MSG\n")

example raw syslog entry:
Apr 1 00:10:01 -04:00 Sun 2012-04-01T00:10:01-04:00 host01x03 cron [info] crond[21399]: (root) CMD (/usr/lib/sa/sa1 1 1)

0 Karma
Highlighted

Re: Indexing hostname by segment issue

Contributor

maybe the issue is my defined path of /logs/syslog/linux/.../log
perhaps Splunk is not expanding this before assigning hostname from path, so it tries to extract it from the raw data? if so this is not documented, i would expect it to expand the path before extracting the hostname from path for the log file it reads, etc. i rely on my syslog-ng config to properly store host data in a correct location regardless of how the raw data may be formatted, meaning raw data may have wrong hostname but syslog-ng puts it in correctly defined path, etc. why just this one linux host and not all my data?

0 Karma
Highlighted

Re: Indexing hostname by segment issue

Builder

This is the default action for syslog data. It overwrites the host with whatever it finds after the timestamp. You can get round this by changing the sourcetype to something else or turning off the default processing for syslog data.
To stop the default processing add the following two lines to a local/props.conf file.

[syslog]

TRANSFORMS

0 Karma
Highlighted

Re: Indexing hostname by segment issue

Contributor

perhaps you explained why i see what i see, but that would be completely confusing since the source page gives the options to define hostname by path segment, and offers no text explanation or options to override the default "syslog" definition as you explain it. this falls into my book of klugeyness.

0 Karma
Highlighted

Re: Indexing hostname by segment issue

Contributor

i added those two lines to /opt/splunk/etc/system/local/props.conf and bounced the service, it is still logging host=Tue

0 Karma
Highlighted

Re: Indexing hostname by segment issue

Builder

oops. That last line should have been

TRANSFORMS =

0 Karma
Highlighted

Re: Indexing hostname by segment issue

Contributor

i changed my syslog template to ("$WEEKDAY $DATE $TZ $HOST $FACILITY [$LEVEL] $MSG\n")

0 Karma