Splunk Search

Transforms.conf regex performance

Path Finder

I am trying to capture the logging of any martian packets on a Linux system, so I decided to set a monitor in /var/log/messages and created a transform that sends to the indexQueue only messages that are related to martian packets. I wrote this regex:

\w{1,4}\s+\d{0,2}\s+[01][0-9]:[0-5][0-9]:[0-5][0-9]\s+[a-z]+\s+kernel:\s+martian\s+source\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+from\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3},\s+on\s+dev\s\w+\n.+

Is this overkill for performance purposes and would it even work? I have read that the more detailed the regex, the better it would perform and since that file logs the majority of the kernel messages (I don't care about any other but martian packets for this specific system) I figured I would have to make sure it wouldn't slow down the receiving indexer.

Thoughts and comments? Thanks!

1 Solution

Explorer

you can modify your rsyslod/syslog-ng configuration on the linux host and write events you are interested in a separate file, then monitor this file with UF.

old rsyslogd format:
:msg, contains, "kernel: martian source" -/var/log/martian.log

new rsyslogd format:
if $msg contains 'kernel: martian source' then /var/log/martian.log

Don't forget to add logrotate configuration (copy /etc/logrotate.d/syslog to /etc/logrotate.d/martian and modify accordingly) so the martian.log will be rotated and at some point deleted.

View solution in original post

Explorer

you can modify your rsyslod/syslog-ng configuration on the linux host and write events you are interested in a separate file, then monitor this file with UF.

old rsyslogd format:
:msg, contains, "kernel: martian source" -/var/log/martian.log

new rsyslogd format:
if $msg contains 'kernel: martian source' then /var/log/martian.log

Don't forget to add logrotate configuration (copy /etc/logrotate.d/syslog to /etc/logrotate.d/martian and modify accordingly) so the martian.log will be rotated and at some point deleted.

View solution in original post

Builder

The best way to know if a regex is good or not is to put some examples of martian packets into a www.regex101.com example with your regex. It will tell you the amount of steps that is takes to accomplish the extraction. I would then save it on that website and share the link on your question so people have some sample data to work with.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

Path Finder

Doesn't Splunk use perl? I don't see it as an engine option in that website

0 Karma

Builder

Per their documentation they use PCRE: Splunk regular expressions are PCRE (Perl Compatible Regular Expressions) and use the PCRE C library.
https://docs.splunk.com/Documentation/Splunk/8.0.1/Knowledge/AboutSplunkregularexpressions

Traditionally the PCRE (PHP) engine on the regex101.com website is used for regex trouble shooting with splunk and has been extremely accurate in my personal use with it and with other Splunk users on this board.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

Path Finder

This is the link with a very simple regex:
https://regex101.com/r/grB83o/1
If you check the debugger, it runs thousands of steps if there are many logs that don't match the pattern.

This is the link with the regex I posted (with some alterations):
https://regex101.com/r/16yOf7/1

How do I force the regex to skip a line if it immediately doesn't match the pattern, instead of looping in the same line trying to find anything that matches? (The question makes more sense if you check the debugger on steps that were going through lines that did not match the pattern.)

0 Karma

SplunkTrust
SplunkTrust

If all you care about are martian events then just look for the text that identifies it. Everything else is just wasted processing. This string is only 518 steps: server\skernel:\smartian\ssource\s.

I disagree with the notion that detailed regexes perform better. Here is an example to disprove it.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Path Finder

Wouldn't that only extract the segments that match the expression? I am trying to extract the entire line so I can identify timestamps and IP addresses, as well as the following line (which is why I add '\n.+' at the end of the expression).

0 Karma

SplunkTrust
SplunkTrust

If you're using transforms to route events, there is no extraction happening. All you need to do is identify which events get indexed and which do not.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Path Finder

I did not know that, that is actually very helpful!

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!