Getting Data In

Is Splunk's "syslog-host" REGEX in $SPLUNK_HOME/etc/system/default/transforms.conf broken?

woodcock
Esteemed Legend

In $SPLUNK_HOME/etc/system/default/ we find this troublesome configuration in transforms.conf:

[syslog-host]
DEST_KEY = MetaData:Host
REGEX = :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(\w[\w\.\-]{2,})\]?\s
FORMAT = host::$1

It matches these in props.conf:

########## EMAIL ##########

[postfix_syslog]
pulldown_type = 1
MAX_TIMESTAMP_LOOKAHEAD = 32
TIME_FORMAT = %b %d %H:%M:%S
TRANSFORMS-host = syslog-host
REPORT-syslog = syslog-extractions
SHOULD_LINEMERGE = False
category = Email
description = Output produced by the Postfix email server

[sendmail_syslog]
pulldown_type = 1
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TIME_FORMAT = %b %d %H:%M:%S
TRANSFORMS = syslog-host
REPORT-syslog = sendmail-extractions
category = Email
description = Output produced by the Sendmail email server

########## OSs ##########

[linux_messages_syslog]
pulldown_type = 1
MAX_TIMESTAMP_LOOKAHEAD = 32
TIME_FORMAT = %b %d %H:%M:%S
TRANSFORMS = syslog-host
REPORT-syslog = syslog-extractions
SHOULD_LINEMERGE = False
category = Operating System
description = Format found within the Linux log file /var/log/messages

[windows_snare_syslog]
pulldown_type = 1
MAX_TIMESTAMP_LOOKAHEAD = 32
TRANSFORMS = syslog-host
REPORT-syslog = syslog-extractions
SHOULD_LINEMERGE = False
TIME_FORMAT = %b %d %H:%M:%S
category = Operating System
description = Output produced by the Snare syslog server on Windows

[syslog]
pulldown_type = true
maxDist = 3
TIME_FORMAT = %b %d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 32
TRANSFORMS = syslog-host
REPORT-syslog = syslog-extractions
SHOULD_LINEMERGE = False
category = Operating System
description = Output produced by many syslog daemons, as described in RFC3164 by the IETF

########## ROUTERS AND FIREWALLS ##########

[cisco_syslog]
pulldown_type = 0
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TIME_FORMAT = %b %d %H:%M:%S
TRANSFORMS = syslog-host
REPORT-syslog = syslog-extractions

This RegEx is far too permissive because at its simplest, it factors down to this:

:\d\d\s+\[?(?<capture>\w[\w\.\-]{2,})\]?\s

I am seeing that it matches logs like this and setting the host value to the nonsensical GET and GGG:

<13>2019-07-18T20:31:20.854753+00:00 GET login?hsgid=00000000-0000-0000-0 HTTP/1.1#015
<13>2019-07-18T16:49:09.691477+00:00 GET / HTTP/1.1#015
<13>2019-07-17T20:28:52.087901+00:00 GGG

The problem is that I do not have any representative logs to see what it is really supposed to be doing. I suspect that the fix is to change the * to a + so it would be this:

:\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)+\[?(\w[\w\.\-]{2,})\]?\s

I do realize that the heart of the problem is that we should NOT be using sourcetype value of syslog and we are working to correct this but you would not believe how many different things are in that sourcetype so it is taking a long time.

0 Karma

ww9rivers
Contributor

I don't have an answer, but a question: The logs you quotes seem to be HTTP server logs, rather than syslog messages. Is that the case?

I assume that syslog field extraction/transformation rules would not work to parse HTTP server logs.

0 Karma

woodcock
Esteemed Legend

Yes, as I said, the logs should not be there, but they are. That isn't really the point. The point is that this RegEx is so absurdly permissive that it cannot be correct.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...