Splunk Search

why host_regex is not setting correct hostname

vzzbrs
Explorer

I'm trying to set hostnames extracting them from filenames
I'm using host_regex with this regex:

host_regex = (myserver[1-2].mydomain.com)\W+\w+\.s$

The paths are in this form:

/path/to/files/mail.text.myserver1.mydomain.com.@20141009T084808.s
/path/to/files/mail.text.myserver2.mydomain.com.@20141009T104107.s

I have tried different rex all matching correctly "myserver[1-2].mydomain.com" but in Splunk hostname is always set to default value (Splunk server hostname)

Anyone have some ideas to get it working?

Thanks

1 Solution

jrodman
Splunk Employee
Splunk Employee

Okay, the problem is now clear with your provided input stanza.

From the inputs.conf.spec file:

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file.
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host.
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.
* Defaults to unset.

host_regex uses the source value to extract from. However, you're overriding the value that the file-input code would provide with source=cisco:esa, removing the information you want host_regex to use. You should probably just remove the source=cisco:esa line.

This is anti-recommended in the spec as well.

source = <string>
* Sets the source key/field for events from this input.
* NOTE: Overriding the source key is generally not recommended.  Typically, the
  input layer will provide a more accurate string to aid in problem
  analysis and investigation, accurately recording the file from which the data
  was retreived.  Please consider use of source types, tagging, and search
  wildcards before overriding this value.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

Okay, the problem is now clear with your provided input stanza.

From the inputs.conf.spec file:

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file.
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host.
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.
* Defaults to unset.

host_regex uses the source value to extract from. However, you're overriding the value that the file-input code would provide with source=cisco:esa, removing the information you want host_regex to use. You should probably just remove the source=cisco:esa line.

This is anti-recommended in the spec as well.

source = <string>
* Sets the source key/field for events from this input.
* NOTE: Overriding the source key is generally not recommended.  Typically, the
  input layer will provide a more accurate string to aid in problem
  analysis and investigation, accurately recording the file from which the data
  was retreived.  Please consider use of source types, tagging, and search
  wildcards before overriding this value.

vzzbrs
Explorer

Thanks jrodman,
removing source=cisco:esa solved the issue
It was there because I took the starting inputs.conf from the app README file

0 Karma

jrodman
Splunk Employee
Splunk Employee

Thanks, I'll try to contact the app author.

0 Karma

jrodman
Splunk Employee
Splunk Employee

I'm kind of confused about \W+\w+\.s$ I guess \W will match the .@ and the \w will match the timestamp?

I suppose a regex tester shows a successful match anyway. Probably this is not a regex problem.

Please show the whole input stanza, and let us know the sourcetype that is being applied to the data?

0 Karma

vzzbrs
Explorer

Yes, you're correct: \W is for matching .@ and \w for the timestamp and a regex tester shows succeful match.
The sourcetype is cisco-esa because I'm using the app "Add-on for Cisco ESA"
This is the input stanza I'm using:

[monitor:///path/to/file]
source = cisco:esa
sourcetype = cisco:esa
host_regex = (myserver[1-2].mydomain.com)\W+\w+\.s$
disabled = false
host =

I'm not sure about that "host =" but is added by the web GUI. I already tried to remove it from the inputs.conf file but nothing changed

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...