Hello Everyone.
I m new to splunk and I have one search which is taking a bit longer than others. Is there any suggestion on how to improve this search ?
index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors) engine="engine" Src_SubnetName = "vpn"| eval src= if(isnull(src),name, src)
| eval Dates = _time
| eval Src_SubnetName = Src_Sitename
| convert timeformat="%Y-%m-%d" ctime(Dates)
| stats dc(src) by src,Src_SubnetName, Dates
To add to what @efavreau said about identifying words that will end up in every result...
I've had a lot of success using the [Patterns] analysis of search results to identify these words.
Also, [All Fields] and then sorting to see fields with maximum "100% Event Coverage" and "# of Values" can help as well.
index=mydatasource_* ((sourcetype = x_connections src=*) OR (sourcetype= x_collectors name=*)) engine="engine" Src_SubnetName = "vpn"
| eval src= coalesce(src,name)
| eval Dates = strftime(_time, "%F")
| stats estdc(src) as distinct_src_count by Src_Sitename, Dates
| rename Src_Sitename as Src_SubnetName
Your query has extra calculations.
How about this?
Hello @rafazurc ,
run these searches (use the "smart mode", use a short period like last 60min instead of last 24hours) and post their search.log (your search.log screenshot is not complete and some important information can be missed) :
search 1:
index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors) engine="engine" Src_SubnetName = "vpn"| eval src= if(isnull(src),name, src)
| eval Dates = _time
| eval Src_SubnetName = Src_Sitename
| convert timeformat="%Y-%m-%d" ctime(Dates)
| stats dc(src) by src,Src_SubnetName, Dates
search 2:
index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors) engine="engine" Src_SubnetName = "vpn"
search 3:
index=mydatasource_* (sourcetype = x_connections OR sourcetype= x_collectors)
by comparing durations of command.search component you'll get the idea if your search can be [easily] optimized.
search 4:
index=mydatasource_* sourcetype = x_connections
search 5:
index=mydatasource_* sourcetype= x_collectors
also check the splunk documentation and try to find out if this a rare/sparse or rare search
The dc
aggregation function can be very expensive. Did you job inspector give any insight as to where the time is being spent? I'm also curious what you're ultimately trying to achieve...knowing that may help the community solve your challenge.
See this link for info on dc
and how to work around it.
Hello @jpolvino. I ve added the print of mu job inspector results. What I m trying to achieve is. I have 2 sourcetypes one is the connection and the other collector. The fist one, the field I need to use is src, the second is name. So I m trying to check each event and if src is null consider the name.
After that, I m formatting _time as date, and the SubnetName is a common field for both sourcetypes. The result I need is to list distinct src by each network by day.
I really would like to optimize this search to reduce the search cost. I m checking the link you ve sent looking for more hints.
Thanks
@rafazurc The more specific you can make a search before the first |
, the faster it will be. Do you need need blank src in your results? The put in src=*, to get rid of blanks. Do you need all those indexes? Is there any other detail, even a word or two that will appear in every result? Put all of that up front before the first pipe. Otherwise, it is what it is. The rest of your SPL isn't expensive.
Hello @efavreau. As I have 2 sourcetype and one has src and other name. Does it work to add before the first pipe (src=* OR name=*) Thanks
@rafazurc If you need these fields, then adding (src=* OR name=*)
is better than not having it.
How long is "a bit"? How much data is being searched? Searching more data will take more time.
Hello @richgalloway .
To search the last 24hours (~200M events ) takes around 45 minutes. and generates ~80k Results.
Thanks.