Splunk Search

why host_regex is not setting correct hostname

vzzbrs
Explorer

I'm trying to set hostnames extracting them from filenames
I'm using host_regex with this regex:

host_regex = (myserver[1-2].mydomain.com)\W+\w+\.s$

The paths are in this form:

/path/to/files/mail.text.myserver1.mydomain.com.@20141009T084808.s
/path/to/files/mail.text.myserver2.mydomain.com.@20141009T104107.s

I have tried different rex all matching correctly "myserver[1-2].mydomain.com" but in Splunk hostname is always set to default value (Splunk server hostname)

Anyone have some ideas to get it working?

Thanks

1 Solution

jrodman
Splunk Employee
Splunk Employee

Okay, the problem is now clear with your provided input stanza.

From the inputs.conf.spec file:

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file.
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host.
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.
* Defaults to unset.

host_regex uses the source value to extract from. However, you're overriding the value that the file-input code would provide with source=cisco:esa, removing the information you want host_regex to use. You should probably just remove the source=cisco:esa line.

This is anti-recommended in the spec as well.

source = <string>
* Sets the source key/field for events from this input.
* NOTE: Overriding the source key is generally not recommended.  Typically, the
  input layer will provide a more accurate string to aid in problem
  analysis and investigation, accurately recording the file from which the data
  was retreived.  Please consider use of source types, tagging, and search
  wildcards before overriding this value.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

Okay, the problem is now clear with your provided input stanza.

From the inputs.conf.spec file:

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file.
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host.
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.
* Defaults to unset.

host_regex uses the source value to extract from. However, you're overriding the value that the file-input code would provide with source=cisco:esa, removing the information you want host_regex to use. You should probably just remove the source=cisco:esa line.

This is anti-recommended in the spec as well.

source = <string>
* Sets the source key/field for events from this input.
* NOTE: Overriding the source key is generally not recommended.  Typically, the
  input layer will provide a more accurate string to aid in problem
  analysis and investigation, accurately recording the file from which the data
  was retreived.  Please consider use of source types, tagging, and search
  wildcards before overriding this value.

vzzbrs
Explorer

Thanks jrodman,
removing source=cisco:esa solved the issue
It was there because I took the starting inputs.conf from the app README file

0 Karma

jrodman
Splunk Employee
Splunk Employee

Thanks, I'll try to contact the app author.

0 Karma

jrodman
Splunk Employee
Splunk Employee

I'm kind of confused about \W+\w+\.s$ I guess \W will match the .@ and the \w will match the timestamp?

I suppose a regex tester shows a successful match anyway. Probably this is not a regex problem.

Please show the whole input stanza, and let us know the sourcetype that is being applied to the data?

0 Karma

vzzbrs
Explorer

Yes, you're correct: \W is for matching .@ and \w for the timestamp and a regex tester shows succeful match.
The sourcetype is cisco-esa because I'm using the app "Add-on for Cisco ESA"
This is the input stanza I'm using:

[monitor:///path/to/file]
source = cisco:esa
sourcetype = cisco:esa
host_regex = (myserver[1-2].mydomain.com)\W+\w+\.s$
disabled = false
host =

I'm not sure about that "host =" but is added by the web GUI. I already tried to remove it from the inputs.conf file but nothing changed

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...