Splunk Search

How do you track down inefficient field extractions?

twinspop
Influencer

Just had to support a user with field extraction issues. While working on it, I noticed the report was still taking a LONG time. Like over 5 minutes. I checked with tstats and the raw event count it was running against was only 2000 events. Wtf? So I started disassembling the field extractions. They were all inline regex created via the GUI field extractor. After cleaning them up, the same exact search takes less than 0.2 seconds.

Sample suspect extraction created via FE in the GUI:

(?=[^A]*(?:ARFF_WP_PAYSTATEMENTS:|A.*ARFF_WP_PAYSTATEMENTS:))^(?:[^:\n]*:){4}\s+(?P<ARFF_WP_PAYSTATEMENTS>.+)

Changed to:

ARFF_WP_PAYSTATEMENTS:\s+(?P<ARFF_WP_PAYSTATEMENTS>\d+)

Sample data was:

lots of log data ARFF_WP_PAYSTATEMENTS: 123

My test was to run in fast mode, and pipe results in stats sum(ARFF_WP_PAYSTATEMENTS). With the old regex, it took forever. With the new regex in place, it ran sub second no matter the window.

So my question is: Given that field extractions can so severely impact search performance, how do you track down the big offenders?

Thanks,
Jon

cpetterborg
SplunkTrust
SplunkTrust

This isn't exactly an answer, but I'm just putting in my 2¢ here about regular expressions because I feel it is so important:

If you need efficient, maintainable field extractions, learn regular expressions and don't use the field extractions that come from the UI FET. With a little knowledge about regex you can do incredible stuff. I write regular expressions for people all the time, and it makes a huge difference in the readability, maintainability, usability, and speed.

The Field Extraction Tool is great for people that don't know regular expressions, but I'd never use the final regex that it outputs automatically. You either need to polish it up, or write your own.

Learn regular expressions! 🙂

0 Karma

twinspop
Influencer

Oh I'm aware. I consider myself a seasoned RE veteran. Problem is the 5000 users on the Splunk system here. Adding new ones every day. THEY couldn't care less about RE and use the FE often.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

I certainly feel your pain. We have the same problems here.

I hold two "Splunk Roundtables" a month where we talk about Splunk with our users - one directed at beginners and one at more advanced users. There I can teach them about regular expressions and encourage them to let me help them with the field extractions. I regularly get them to come to me for help and I "dazzle" them with regex magic. They become encouraged to do their own regular expressions. That way even the mediocre results are hopefully eliminated.

Also we are going through our field extractions with our users to get them to become CIM compliant, which helps to fix a multitude of sins. That is a big project, but we have so many custom log files (we have more than 250 custom apps that we deploy to the UF's for the custom log files) that we have to do something to bring sanity to the mix.

0 Karma

woodcock
Esteemed Legend

Start with a search for long jobs like this:

|rest /services/search/jobs | sort 0 - performance_command_addinfo_duration_secs

Also check out these apps:
https://splunkbase.splunk.com/app/2632/
https://splunkbase.splunk.com/app/493/
https://splunkbase.splunk.com/app/2967/

Also keep any eye on stuff in the Monitoring Console:
https://docs.splunk.com/Documentation/Splunk/6.5.3/DMC/DMCoverview

0 Karma

adonio
Ultra Champion

Jon,
here is a related question with no answer yet: https://answers.splunk.com/answers/368018/how-to-track-slow-running-field-extractions.html. I am not aware of a metric being logged that measures this performance. However, there are indicators where a regex is bad such as length, using greedy .* or + and such.
Here is a short query that filter all extraction by length as a parameter. maybe it can help a little:
| rest /services/data/props/extractions | search attribute=EXTRACT* | eval"rexLength" = len(value) | table attribute author eai:acl.app stanza title value rexLength | where rexLength >

0 Karma

twinspop
Influencer

I guess I should clarify: The user claims to have used the FE to generate. Whether he actually did, or changed it after, are up for argument I suppose. Thanks for the tip on the endpoint. Maybe I'll do some checking.

0 Karma

DalJeanis
Legend

That's a pretty bizarre regex the system generated. On your replacement, did you mean for there to be a colon at the end of the field name?

0 Karma

twinspop
Influencer

No. Fixing. 🙂

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...