Splunk Search

why host_regex is not setting correct hostname

vzzbrs
Explorer

I'm trying to set hostnames extracting them from filenames
I'm using host_regex with this regex:

host_regex = (myserver[1-2].mydomain.com)\W+\w+\.s$

The paths are in this form:

/path/to/files/mail.text.myserver1.mydomain.com.@20141009T084808.s
/path/to/files/mail.text.myserver2.mydomain.com.@20141009T104107.s

I have tried different rex all matching correctly "myserver[1-2].mydomain.com" but in Splunk hostname is always set to default value (Splunk server hostname)

Anyone have some ideas to get it working?

Thanks

1 Solution

jrodman
Splunk Employee
Splunk Employee

Okay, the problem is now clear with your provided input stanza.

From the inputs.conf.spec file:

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file.
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host.
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.
* Defaults to unset.

host_regex uses the source value to extract from. However, you're overriding the value that the file-input code would provide with source=cisco:esa, removing the information you want host_regex to use. You should probably just remove the source=cisco:esa line.

This is anti-recommended in the spec as well.

source = <string>
* Sets the source key/field for events from this input.
* NOTE: Overriding the source key is generally not recommended.  Typically, the
  input layer will provide a more accurate string to aid in problem
  analysis and investigation, accurately recording the file from which the data
  was retreived.  Please consider use of source types, tagging, and search
  wildcards before overriding this value.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

Okay, the problem is now clear with your provided input stanza.

From the inputs.conf.spec file:

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file.
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host.
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.
* Defaults to unset.

host_regex uses the source value to extract from. However, you're overriding the value that the file-input code would provide with source=cisco:esa, removing the information you want host_regex to use. You should probably just remove the source=cisco:esa line.

This is anti-recommended in the spec as well.

source = <string>
* Sets the source key/field for events from this input.
* NOTE: Overriding the source key is generally not recommended.  Typically, the
  input layer will provide a more accurate string to aid in problem
  analysis and investigation, accurately recording the file from which the data
  was retreived.  Please consider use of source types, tagging, and search
  wildcards before overriding this value.

vzzbrs
Explorer

Thanks jrodman,
removing source=cisco:esa solved the issue
It was there because I took the starting inputs.conf from the app README file

0 Karma

jrodman
Splunk Employee
Splunk Employee

Thanks, I'll try to contact the app author.

0 Karma

jrodman
Splunk Employee
Splunk Employee

I'm kind of confused about \W+\w+\.s$ I guess \W will match the .@ and the \w will match the timestamp?

I suppose a regex tester shows a successful match anyway. Probably this is not a regex problem.

Please show the whole input stanza, and let us know the sourcetype that is being applied to the data?

0 Karma

vzzbrs
Explorer

Yes, you're correct: \W is for matching .@ and \w for the timestamp and a regex tester shows succeful match.
The sourcetype is cisco-esa because I'm using the app "Add-on for Cisco ESA"
This is the input stanza I'm using:

[monitor:///path/to/file]
source = cisco:esa
sourcetype = cisco:esa
host_regex = (myserver[1-2].mydomain.com)\W+\w+\.s$
disabled = false
host =

I'm not sure about that "host =" but is added by the web GUI. I already tried to remove it from the inputs.conf file but nothing changed

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...