Splunk Search

Host name extraction from syslog repository

Engager

Greetings,

I have inherited a Splunk 4.1 infrastructure and while I am getting up to speed on Splunk, I need assistance with an issue that I'm having.

I have a log file that Splunk is monitoring that is a repository of syslog output from many machines. There is an index time transform that is extracting the remote host name from the events for the host field. I am trying to change the regex in that transform to adapt it to events that are not matching because they're slightly different in formatting. The problem is every change I've made has been ineffective. I've read through numerous questions here on this topic and have checked my setup and from what I can see, this should be working.

Below are the relevant bits of the config as I inherited them. These are all located in etc/apps/search/local.

inputs.conf

[monitor:///logs/syslog/central_syslog_repository.log]
disabled = false
host = splunk-indexer
sourcetype = syslog

props.conf

[source::/logs/syslog/central_syslog_repository.log]
REPORT-hostip = custom_ip_field
sourcetype = syslog_ng
TRANSFORMS = custom_host_indexing

[source::.../central_syslog_repository.log*]
REPORT-hostip = custom_ip_field
sourcetype = syslog_ng
TRANSFORMS = custom_host_indexing

Side note: How would precedence apply to the above two props stanzas considering the source?

transforms.conf

[custom_host_indexing]
DEST_KEY = MetaData:Host
REGEX = ^\w+\s+\d+\s+\d\d:\d\d:\d\d\s+-\s+([\w-\.]+)\s+-
FORMAT = host::$1

[custom_ip_field]
REGEX = ^\w+\s+\d+\s+\d\d:\d\d:\d\d\s+-\s+[\w-\.]+\s+-\s+([\d.]+)\s+-
FORMAT = host_ip::$1

The following is how I changed the custom_host_indexing transform...

[custom_host_indexing]
DEST_KEY = MetaData:Host
REGEX = ^\w+\s+\d+\s+\d\d:\d\d:\d\d\s+(-\s+)?([\w-\.]+),?\s+-?
FORMAT = host::$2

I tested the modified regex in a perl script and it extracts host names as expected.

The following is a sampling of the log events I am trying to extract the host names from. The first two lines match successfully, but the remaining lines do not.

Jun 29 14:49:15 - R2.X.X.X - 1.1.1.1 - 17639518: Jun 29 21:49:14 GMT: %SEC-6-IPACCESSLOGDP: list 128 denied icmp 2.2.2.2 -> 3.3.3.3 (8/0), 1 packet
Jun 29 14:49:16 - R1.X.X.X - 1.1.1.1 - 4511942: Jun 29 21:49:15 GMT: %SEC-6-IPACCESSLOGDP: list 104 denied icmp 2.2.2.2 -> 3.3.3.3 (8/0), 1 packet

Jun 29 21:44:16 r1.Y.Y.Y.Y 591148: Jun 29 21:44:14 GMT: %SEC-6-IPACCESSLOGRL: access-list logging rate-limited or missed 1 packet
Jun 29 21:44:19 r1.Y.Y.Y.Y 342487: Jun 29 21:44:18 UTC: %SEC-6-IPACCESSLOGRL: access-list logging rate-limited or missed 3 packets
Jun 29 21:44:19 r1.Y.Y.Y.Y 342487: Jun 29 21:44:18 UTC: %SEC-6-IPACCESSLOGRL: access-list logging rate-limited or missed 3 packets

I can see through Splunk Web that Splunk is picking up these changes on a restart, but Splunk is still not extracting the host name correctly. It does continue to correctly extract the host name on the first two events above, but not the latter events.

Any thoughts or ideas on how I can proceed to fix or continue debugging this issue?

Thanks!

0 Karma
1 Solution

Engager

Just to follow up with the found solution. My extraction changes were functional from the beginning, but because of a timezone mismatch, new events were being projected 7 hours into the future and I would not see the changes. I ultimately stumbled upon the problem while doing a Real-time (All-time) search and saw the events appearing in the future.

Lesson learned, if changes don't appear to do anything, make sure you're actually seeing live data. 🙂

View solution in original post

Engager

Just to follow up with the found solution. My extraction changes were functional from the beginning, but because of a timezone mismatch, new events were being projected 7 hours into the future and I would not see the changes. I ultimately stumbled upon the problem while doing a Real-time (All-time) search and saw the events appearing in the future.

Lesson learned, if changes don't appear to do anything, make sure you're actually seeing live data. 🙂

View solution in original post

Splunk Employee
Splunk Employee

you should use a polyvalent regex, that will match the host for your both formats.

REGEX = ^\w+\s+\d+\s+\d\d:\d\d:\d\d(\s-\s|\s)([\w-\.]+)\s
FORMAT = host::$2
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!