Speed Search Review

rafazurc · ‎04-13-2020

Hello Everyone.

I m new to splunk and I have one search which is taking a bit longer than others. Is there any suggestion on how to improve this search ?

index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors) engine="engine" Src_SubnetName = "vpn"| eval src= if(isnull(src),name, src)
| eval Dates = _time
| eval Src_SubnetName = Src_Sitename
| convert timeformat="%Y-%m-%d" ctime(Dates)
| stats dc(src) by src,Src_SubnetName, Dates

sectrainingjk · ‎04-13-2020

To add to what @efavreau said about identifying words that will end up in every result...

I've had a lot of success using the [Patterns] analysis of search results to identify these words.

Also, [All Fields] and then sorting to see fields with maximum "100% Event Coverage" and "# of Values" can help as well.

to4kawa · ‎04-13-2020

index=mydatasource_* ((sourcetype = x_connections src=*) OR (sourcetype= x_collectors name=*)) engine="engine" Src_SubnetName = "vpn" 
| eval src= coalesce(src,name) 
| eval Dates = strftime(_time, "%F") 
| stats estdc(src) as distinct_src_count by Src_Sitename, Dates
| rename Src_Sitename as Src_SubnetName

Your query has extra calculations.
How about this?

PavelP · ‎04-13-2020

Hello @rafazurc ,

run these searches (use the "smart mode", use a short period like last 60min instead of last 24hours) and post their search.log (your search.log screenshot is not complete and some important information can be missed) :

search 1:

index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors) engine="engine" Src_SubnetName = "vpn"| eval src= if(isnull(src),name, src)
| eval Dates = _time
| eval Src_SubnetName = Src_Sitename
| convert timeformat="%Y-%m-%d" ctime(Dates)
| stats dc(src) by src,Src_SubnetName, Dates

search 2:

index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors) engine="engine" Src_SubnetName = "vpn"

search 3:

index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors)

by comparing durations of command.search component you'll get the idea if your search can be [easily] optimized.

search 4:

index=mydatasource_* sourcetype = x_connections

search 5:

index=mydatasource_* sourcetype= x_collectors

also check the splunk documentation and try to find out if this a rare/sparse or rare search

jpolvino · ‎04-13-2020

The dc aggregation function can be very expensive. Did you job inspector give any insight as to where the time is being spent? I'm also curious what you're ultimately trying to achieve...knowing that may help the community solve your challenge.

See this link for info on dc and how to work around it.

rafazurc · ‎04-13-2020

Hello @jpolvino. I ve added the print of mu job inspector results. What I m trying to achieve is. I have 2 sourcetypes one is the connection and the other collector. The fist one, the field I need to use is src, the second is name. So I m trying to check each event and if src is null consider the name.

After that, I m formatting _time as date, and the SubnetName is a common field for both sourcetypes. The result I need is to list distinct src by each network by day.

I really would like to optimize this search to reduce the search cost. I m checking the link you ve sent looking for more hints.

Thanks

efavreau · ‎04-13-2020

@rafazurc The more specific you can make a search before the first |, the faster it will be. Do you need need blank src in your results? The put in src=*, to get rid of blanks. Do you need all those indexes? Is there any other detail, even a word or two that will appear in every result? Put all of that up front before the first pipe. Otherwise, it is what it is. The rest of your SPL isn't expensive.

###

If this reply helps you, an upvote would be appreciated.

rafazurc · ‎04-13-2020

Hello @efavreau. As I have 2 sourcetype and one has src and other name. Does it work to add before the first pipe (src=* OR name=*) Thanks

efavreau · ‎04-13-2020

@rafazurc If you need these fields, then adding (src=* OR name=*) is better than not having it.

###

If this reply helps you, an upvote would be appreciated.

richgalloway · ‎04-13-2020

How long is "a bit"? How much data is being searched? Searching more data will take more time.

---
If this reply helps you, Karma would be appreciated.

rafazurc · ‎04-13-2020

Hello @richgalloway .

To search the last 24hours (~200M events ) takes around 45 minutes. and generates ~80k Results.

Thanks.

Speed Search Review

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

Preparing your Splunk Environment for OpenSSL3

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector