Monitoring Splunk

Tuning SEDCMDs -- how do you measure gains?

twinspop
Influencer

I'm using SEDCMD to cleanup (and reduce) iislogs:

# remove all path info but the (unique) file name
SEDCMD-uritrim = s% /commonurlbase[^ ]*/% ./%
# reduce chrome version (chrome mentions safari, so separate sed needed?)
SEDCMD-chrome = s% Mozilla[^ ]*Chrome.([0-9.]*)[^ ]*% Chrome-\1%
# reduce agent name version
SEDCMD-agents = s% Mozilla[^ ]*(Safari|Firefox|MSIE).([0-9.]*)[^ ]*% \1-\2%
# trim sid query string from referral url
SEDCMD-reftrim = s%.aspx\?s[iI][dD]=[^ ]*%.aspx%
# trim portal sign-on shenanigans from referral
SEDCMD-portalreftrim = s%/\!ut[^ ]*%%

I was thinking of ways to combine (some of) these, and/or maybe try to come up with a more efficient regex on some of them. What are some options for testing the performance effect of the changes? With 200+ GB of logs passing through daily, I want to be sure we're as efficient as we can be -- allowing that the logs need to be 'cleaned' as outlined above.

Thanks,

jon

0 Karma

dwaddle
SplunkTrust
SplunkTrust

This might be rather difficult to measure. Assuming that your daily indexing volume remains mostly-flat day to day, you might be able to come up with a measurement based on CPU seconds used by the indexing process day over day. The main issue is that these regexes will be firing as events come in, potentially changing the raw value of the event. Each test of "does this regex match?" uses a miniscule amount of CPU time, and each substitution if it does match uses a only a little more.

Your most accurate bet (which is a lot of work) would be to implement a simple regex profiler. We know that Splunk uses PCRE, which is open source. You could build a test harness to evaluate the use of each of these regexes, over a sample of several hundred thousand events, in a controlled fashion. No, not easy at all - but it would be more accurate than trying to measure it in-situ in a running indexer.

0 Karma

twinspop
Influencer

Well, not so much interested in specific regex tips, but how to evaluate whether a new regex is helpful or hurtful. Log example: 2012-02-27 21:57:00 172.20.90.43 POST /websiterooturi/subfolder/somepage.aspx sID=abcdef1234567890ABCDEF 80 - 172.20.176.20 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+Trident/4.0;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729) https://snazzywebsitedotcom/websiterooturi/subfolder/referringpage.aspx?sID=abcdef1234567890ABCDEF 200 1146 4870 375

0 Karma

Masa
Splunk Employee
Splunk Employee

Need sample logs to provide if you are really looking for better regex.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...