Splunk Search

Why is my stats by command so slow and how can I speed it up for longer time intervals?

timbCFCA
Path Finder

I'm working on some statistics related queries. I'm trying to get the security id, date and count of hosts connected to.

index=wineventlog sourcetype="WinEventLog:Security" 4624 |
fields host,Security_ID,_time  | 
bucket _time span=1d | 
stats dc(host) by Security_ID, _time

They work perfectly until I start adding Security_ID. With no by command or only based on time it's fast.

I also tried to do a dedup Security_ID, _time, host before the stats dc command but it didn't help the overall speed.

It takes well over 10 minutes to complete this search for a week, and I'd like to be able to run this for 30 60 or 90 days. What do I need to do for that to be viable?

lfedak_splunk
Splunk Employee
Splunk Employee

Heya @timbCFCA, if DalJeanis solved your problem, please don't forget to accept an answer! You can upvote posts as well. (Karma points will be awarded for either action.) Happy Splunking!

0 Karma

worshamn
Contributor

Seems puzzling, I do see that Security_ID gets used as a source key several times in transforms.conf in the TA, I wonder if that causes any overhead. Maybe try specifying EventCode=4624 so that it isn't searching through all fields looking for 4624.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

How often are you running the search? If you are running it fairly often, then you might consider a summary index so you don't have to re-spin the whole search multiple times a day.

Try this -

 index=wineventlog sourcetype="WinEventLog:Security" 4624 
| fields host, Security_ID, _time  
| eval _time=floor(_time/86400)*86400
| dedup host,Security_ID,_time
| stats dc(host) by Security_ID, _time

Explanation -

eval is streaming and distributed, while bin is not. This way, the binning can be done at the individual indexers.

Try it with and without the dedup and see what happens.

timbCFCA
Path Finder

@DalJeanis not really. It did let me save as an accelerated search which helps.

DalJeanis
SplunkTrust
SplunkTrust

@timbCFCA - Well, that's a decent consolation prize.

Did you try it without the dedup, which is redundant with the dc()?

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

@timbCFCA - Did this change the run time at all?

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!