I have a query , where i have multiple append commands used to get the output result.
The query has 1 index and 3 source paths. Initially, I was using inputlookup to get the data, but now we changed the source path. The problem now is search time is taking too long, around 8 sec, to return results. I need to optimize that.
Below is the query...i was thinking if we eliminate the append the query, we could get faster results. Please suggest or if there are any alternate ways.
index=csvlookups source="F:\\SplunkMonitor\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\lookup_table_sip_pbx_forecasts.csv"
| eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
| sort Date, CLLI
| rename CLLI as Office
|search abc
| stats sum(Usage) as Usage by Office, Date
| append
[ search index=csvlookups
| eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
| reverse
| search Office=abc AND Type="Test-new"
| fields Date NB_RTU
| fields - _raw _time ]
| sort Date
| fillnull value=abc Office
| filldown Usage
| filldown NB_RTU
| fillnull value=0 Usage
| eval _time = strptime(Date, "%Y-%m-%d")
| eval latest_time = if("now" == "now", now(), relative_time(now(), "now"))
| where ((_time >= relative_time(now(), "-3y@h")) AND (_time <= latest_time))
| fields - latest_time Date
| append
[ gentimes start=-1
| eval Date=strftime(mvrange(now(),now()+60*60*24*365*3,"1mon"),"%F")
| mvexpand Date
| fields Date
| append
[ search index=csvlookups
| rename "Expected Date of Addition" as edate
| eval edate=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
| rename edate as "Expected Date of Addition"
| table Contact Customer "Expected Date of Addition" "Number of Channels" Switch
| reverse
| search Customer = "Regular Usage" AND Switch = "$Office$"
| rename "Number of Channels" as val
| return $val ]
| reverse
| filldown search
| rename search as Usage
| where Date != ""
| reverse
| append
[ search index=csvlookups
| rename "Expected Date of Addition" as edate
| eval edate=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
| rename edate as "Expected Date of Addition"
| table Contact Customer "Expected Date of Addition" "Number of Channels" Switch
| reverse
| search Customer != "Regular Usage" AND Switch = "$Office$"
| rename "Expected Date of Addition" as Date
| eval _time=strptime(Date, "%Y-%m-%d")
| rename "Number of Channels" as Forecast
| stats sum(Forecast) as Forecast by Date]
| sort Date
| rename Switch as Office
| eval Forecast1 = if(isnull(Forecast),Usage,Forecast)
| fields - Usage Forecast
| streamstats sum(Forecast1) as Forecast
| fields - Forecast1
| eval Date=strptime(Date, "%Y-%m-%d")
| eval Date=if(Date < now(), now(), Date) ]
| filldown Usage
| filldown Office
| eval Forecast = Forecast + Usage
| eval Usage = if(Forecast >= 0,NULL,Usage)
| eval _time=if(isnull(_time), Date, _time)
| timechart limit=0 span=1w max(Usage) as Usage, max(NB_RTU) as NB_RTU, max(Forecast) as Forecast by Office
| rename "NB_RTU: $Office$" as "RTU's Purchased", "Usage: $Office$" as "Usage", "Forecast: $Office$" as "Forecast"
| filldown "RTU's Purchased"
If inputlookup
was working well you should stick with that as you won't get much faster.
It's hard to give specific advice about your query without knowing more about the data and your end goals. In general:
Filter early. Make your base query (before the first '|') as specific as possible. Run your where
and search
clauses as soon as you can.
Use fields
instead of table
. It's more efficient.
Sort only when necessary. Usually, it's not necessary.
Fewer appends
is better.
If inputlookup
was working well you should stick with that as you won't get much faster.
It's hard to give specific advice about your query without knowing more about the data and your end goals. In general:
Filter early. Make your base query (before the first '|') as specific as possible. Run your where
and search
clauses as soon as you can.
Use fields
instead of table
. It's more efficient.
Sort only when necessary. Usually, it's not necessary.
Fewer appends
is better.
Thank You for quick response.
I have noted the points u mentioned for optimistation.
The reason for not using inputlookup is its the requirement to have source path instead of lookup.
About te query and its Goal:
1 . it will search all the three csv mentioned via source path to which would give Usage , Date and Office,
2. then 1 append would filter on the "Office and Type" to get NB_RTU field value
search Office=abc AND Type="Test-new"
| fields Date NB_RTU
3. next Append which is used for forecasting had 2 sub append in which $val and forecast (sum )is returned with respect to date.
And at end we would be displaying 4 fields in dashboard--_time , Forecast, NB_RTU and Usage
please let me know if i can give more clarification on any part of query
Here's a quick cut at trimming some fat to speed things up.
index=csvlookups source="F:\\SplunkMonitor\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\lookup_table_sip_pbx_forecasts.csv" abc
| eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
| rename CLLI as Office
| stats sum(Usage) as Usage by Office, Date
| append
[ search index=csvlookups Office=abc AND Type="Test-new"
| eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
| fields Date NB_RTU
| fields - _raw _time ]
| sort Date
| fillnull value=abc Office
| filldown Usage
| filldown NB_RTU
| fillnull value=0 Usage
| eval _time = strptime(Date, "%Y-%m-%d")
| eval latest_time = now()
| where ((_time >= relative_time(now(), "-3y@h")) AND (_time <= latest_time))
| fields - latest_time Date
| append
[ gentimes start=-1
| eval Date=strftime(mvrange(now(),now()+60*60*24*365*3,"1mon"),"%F")
| mvexpand Date
| fields Date
| append
[ search index=csvlookups Customer = "Regular Usage" AND Switch = "$Office$"
| rename "Number of Channels" as val
| return $val ]
| where Date != ""
| reverse
| filldown search
| rename search as Usage
| reverse
| append
[ search index=csvlookups Customer != "Regular Usage" AND Switch = "$Office$"
| rename "Expected Date of Addition" as edate
| eval Date=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
| rename "Number of Channels" as Forecast
| stats sum(Forecast) as Forecast by Date]
| sort Date
| rename Switch as Office
| eval Forecast1 = if(isnull(Forecast),Usage,Forecast)
| fields - Usage Forecast
| streamstats sum(Forecast1) as Forecast
| fields - Forecast1
| eval Date=strptime(Date, "%Y-%m-%d")
| eval Date=if(Date < now(), now(), Date) ]
| filldown Usage
| filldown Office
| eval Forecast = Forecast + Usage
| eval Usage = if(Forecast >= 0,NULL,Usage)
| eval _time=if(isnull(_time), Date, _time)
| timechart limit=0 span=1w max(Usage) as Usage, max(NB_RTU) as NB_RTU, max(Forecast) as Forecast by Office
| rename "NB_RTU: $Office$" as "RTU's Purchased", "Usage: $Office$" as "Usage", "Forecast: $Office$" as "Forecast"
| filldown "RTU's Purchased"
thank You for that attempt , but with the above query i am not able to get the output... i was planning to use "summary indexing" and i expect this can be fast enough
Please suggest if this cold be ideal approach
Summary indexing should help with performance.
Apologies for the lack of output. Without data, I was not able to test the query.