Splunk Search

Can this query be made faster?

MonkeyK
Builder

I am trying to summarize network traffic logged by our firewall to determine the factors that have made our index usage exceed estimates by about 50%.

The queries that I am running take several days, so I am hoping that there are some optimizations that might help speed them up.
There are a lot of data (hundreds of millions of log records per day), so I am asking a lot of Splunk to summarize
Indexing logs do not go far enough back to let me do analysis on them. I also have a datamodel that does not go far enough back.
So I cannot take advantage of the faster access that these models can provide

I have already run queries to determine 3 points in time (a week each) establishing an increase in log record volume

Here is what I am doing

index=pan_logs
| eval loc=case(match(host,"fw_loc1*"), "Loc1", match(host,"fw_loc2.*"), "Loc1", match(host,"fw_loc3.*"), "Loc2", match(host,"fw_loc4.*"), "Loc3", match(host,"fw_loc5.*"), "Loc4", match(host,"fw_loc6.*"), "Loc5", match(host,"fw_loc7.*"), "Loc5", match(host,"fw_loc8.*"), "Loc5", match(host,"fw_loc9.*"),"test", match(host,"fw_locA.*"),"external", match(host,"fw_locB.*"),"external",  1=1,host)
| eval istraffic=if(match(eventtype,"pan_traffic"), "Y", null), isurl=if(match(eventtype,"pan_url"), "Y", null), isvuln=if(match(log_subtype,"vulnerability"),"Y",null), trafficDir=src_zone."-".dest_zone
| stats count dc(host) values(host) count(istraffic) count(isurl) count(isvuln) dc(trafficDir) by loc

The idea is to try and determine if a group of sources is adding to the volume and then understand why. The trafficDir is a quick measure of whether the increase might be related to increased firewall complexity.

Does anything here look like I could make it more efficient?

Tags (1)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

There are several options, once DMA is out of the picture.

First, since you have hundreds of millions of events per day and aren't looking for a 100.000% precise answer, you could enable event sampling. A 1:1000 sampling would still look at hundreds of thousands of events per day, giving you a reasonably precise answer - especially when looking for big contributors to volume, they're bound to pop up strongly in sampled events. Ask your friendly neighbourhood statistician about how precise 😛

Second, at least part of your search is based around host - this can be made blazingly fast using tstats:

| tstats prestats=t count dc(host) values(host) where index=pan_logs by host
| eval loc=case(match(host,"fw_loc1"), "Loc1", match(host,"fw_loc2"), "Loc1", match(host,"fw_loc3"), "Loc2", match(host,"fw_loc4.*"), "Loc3", match(host,"fw_loc5"), "Loc4", match(host,"fw_loc6"), "Loc5", match(host,"fw_loc7"), "Loc5", match(host,"fw_loc8"), "Loc5", match(host,"fw_loc9"),"test", match(host,"fw_locA"),"external", match(host,"fw_locB"),"external",  1=1,host)
| stats count dc(host) values(host) by loc

Note, I've removed the .* from the regexes because they contribute no value to match().
The tstats should be fast enough to run without sampling, increasing your confidence in the results you got from the sampled raw data search.

If these two approaches combined are enough, great.
If not, there is a bit more magic you can attempt with tstats - it can match on indexed terms. Depending on the definitions of those eventtypes, deciding whether they match or not might just require looking for a certain indexed term, letting you count by eventtype (sort of) without actually needing the raw data.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...