Splunk Search

Transforms.conf regex performance

ricotries
Communicator

I am trying to capture the logging of any martian packets on a Linux system, so I decided to set a monitor in /var/log/messages and created a transform that sends to the indexQueue only messages that are related to martian packets. I wrote this regex:

\w{1,4}\s+\d{0,2}\s+[01][0-9]:[0-5][0-9]:[0-5][0-9]\s+[a-z]+\s+kernel:\s+martian\s+source\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+from\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3},\s+on\s+dev\s\w+\n.+

Is this overkill for performance purposes and would it even work? I have read that the more detailed the regex, the better it would perform and since that file logs the majority of the kernel messages (I don't care about any other but martian packets for this specific system) I figured I would have to make sure it wouldn't slow down the receiving indexer.

Thoughts and comments? Thanks!

1 Solution

PavelProstine
Explorer

you can modify your rsyslod/syslog-ng configuration on the linux host and write events you are interested in a separate file, then monitor this file with UF.

old rsyslogd format:
:msg, contains, "kernel: martian source" -/var/log/martian.log

new rsyslogd format:
if $msg contains 'kernel: martian source' then /var/log/martian.log

Don't forget to add logrotate configuration (copy /etc/logrotate.d/syslog to /etc/logrotate.d/martian and modify accordingly) so the martian.log will be rotated and at some point deleted.

View solution in original post

PavelProstine
Explorer

you can modify your rsyslod/syslog-ng configuration on the linux host and write events you are interested in a separate file, then monitor this file with UF.

old rsyslogd format:
:msg, contains, "kernel: martian source" -/var/log/martian.log

new rsyslogd format:
if $msg contains 'kernel: martian source' then /var/log/martian.log

Don't forget to add logrotate configuration (copy /etc/logrotate.d/syslog to /etc/logrotate.d/martian and modify accordingly) so the martian.log will be rotated and at some point deleted.

dmarling
Builder

The best way to know if a regex is good or not is to put some examples of martian packets into a www.regex101.com example with your regex. It will tell you the amount of steps that is takes to accomplish the extraction. I would then save it on that website and share the link on your question so people have some sample data to work with.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

ricotries
Communicator

Doesn't Splunk use perl? I don't see it as an engine option in that website

0 Karma

dmarling
Builder

Per their documentation they use PCRE: Splunk regular expressions are PCRE (Perl Compatible Regular Expressions) and use the PCRE C library.
https://docs.splunk.com/Documentation/Splunk/8.0.1/Knowledge/AboutSplunkregularexpressions

Traditionally the PCRE (PHP) engine on the regex101.com website is used for regex trouble shooting with splunk and has been extremely accurate in my personal use with it and with other Splunk users on this board.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

ricotries
Communicator

This is the link with a very simple regex:
https://regex101.com/r/grB83o/1
If you check the debugger, it runs thousands of steps if there are many logs that don't match the pattern.

This is the link with the regex I posted (with some alterations):
https://regex101.com/r/16yOf7/1

How do I force the regex to skip a line if it immediately doesn't match the pattern, instead of looping in the same line trying to find anything that matches? (The question makes more sense if you check the debugger on steps that were going through lines that did not match the pattern.)

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If all you care about are martian events then just look for the text that identifies it. Everything else is just wasted processing. This string is only 518 steps: server\skernel:\smartian\ssource\s.

I disagree with the notion that detailed regexes perform better. Here is an example to disprove it.

---
If this reply helps you, Karma would be appreciated.
0 Karma

ricotries
Communicator

Wouldn't that only extract the segments that match the expression? I am trying to extract the entire line so I can identify timestamps and IP addresses, as well as the following line (which is why I add '\n.+' at the end of the expression).

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you're using transforms to route events, there is no extraction happening. All you need to do is identify which events get indexed and which do not.

---
If this reply helps you, Karma would be appreciated.
0 Karma

ricotries
Communicator

I did not know that, that is actually very helpful!

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...