Solved: How do we optimize the search result time with a q...

vikashperiwal · ‎12-03-2018

I have a query , where i have multiple append commands used to get the output result.

The query has 1 index and 3 source paths. Initially, I was using inputlookup to get the data, but now we changed the source path. The problem now is search time is taking too long, around 8 sec, to return results. I need to optimize that.

Below is the query...i was thinking if we eliminate the append the query, we could get faster results. Please suggest or if there are any alternate ways.

index=csvlookups source="F:\\SplunkMonitor\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\lookup_table_sip_pbx_forecasts.csv"
| eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
| sort Date, CLLI
| rename CLLI as Office
  |search abc
| stats sum(Usage) as Usage by Office, Date
| append
    [ search index=csvlookups 
    | eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
    | reverse
    | search Office=abc AND Type="Test-new"
    | fields Date NB_RTU
    | fields - _raw _time ]
| sort Date
| fillnull value=abc Office
| filldown Usage
| filldown NB_RTU
| fillnull value=0 Usage
| eval _time = strptime(Date, "%Y-%m-%d")
| eval latest_time = if("now" == "now", now(), relative_time(now(), "now"))
| where ((_time >= relative_time(now(), "-3y@h")) AND (_time <= latest_time))
| fields - latest_time Date
| append
    [ gentimes start=-1
    | eval Date=strftime(mvrange(now(),now()+60*60*24*365*3,"1mon"),"%F")
    | mvexpand Date
    | fields Date
    | append
        [ search index=csvlookups 
        | rename "Expected Date of Addition" as edate
        | eval edate=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
        | rename edate as "Expected Date of Addition"
        | table Contact Customer "Expected Date of Addition" "Number of Channels" Switch
        | reverse
        | search Customer = "Regular Usage" AND Switch = "$Office$"
        | rename "Number of Channels" as val
        | return $val ]
    | reverse
    | filldown search
    | rename search as Usage
    | where Date != ""
    | reverse
    | append
        [ search index=csvlookups 
        | rename "Expected Date of Addition" as edate
        | eval edate=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
        | rename edate as "Expected Date of Addition"
        | table Contact Customer "Expected Date of Addition" "Number of Channels" Switch
        | reverse
        | search Customer != "Regular Usage" AND Switch = "$Office$"
        | rename "Expected Date of Addition" as Date
        | eval _time=strptime(Date, "%Y-%m-%d")
        | rename "Number of Channels" as Forecast
        | stats sum(Forecast) as Forecast by Date]
    | sort Date
    | rename Switch as Office
    | eval Forecast1 = if(isnull(Forecast),Usage,Forecast)
    | fields - Usage Forecast
    | streamstats sum(Forecast1) as Forecast
    | fields - Forecast1
    | eval Date=strptime(Date, "%Y-%m-%d")
    | eval Date=if(Date < now(), now(), Date) ]
| filldown Usage
| filldown Office
| eval Forecast = Forecast + Usage
| eval Usage = if(Forecast >= 0,NULL,Usage)
| eval _time=if(isnull(_time), Date, _time)
| timechart limit=0 span=1w max(Usage) as Usage, max(NB_RTU) as NB_RTU, max(Forecast) as Forecast by Office
| rename "NB_RTU: $Office$" as "RTU's Purchased", "Usage: $Office$" as "Usage", "Forecast: $Office$" as "Forecast"
| filldown "RTU's Purchased"

richgalloway · ‎12-04-2018

If inputlookup was working well you should stick with that as you won't get much faster.

It's hard to give specific advice about your query without knowing more about the data and your end goals. In general:

Filter early. Make your base query (before the first '|') as specific as possible. Run your where and search clauses as soon as you can.

Use fields instead of table. It's more efficient.

Sort only when necessary. Usually, it's not necessary.

Fewer appends is better.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway · ‎12-04-2018

If inputlookup was working well you should stick with that as you won't get much faster.

It's hard to give specific advice about your query without knowing more about the data and your end goals. In general:

Filter early. Make your base query (before the first '|') as specific as possible. Run your where and search clauses as soon as you can.

Use fields instead of table. It's more efficient.

Sort only when necessary. Usually, it's not necessary.

Fewer appends is better.

---
If this reply helps you, Karma would be appreciated.

vikashperiwal · ‎12-04-2018

Thank You for quick response.

I have noted the points u mentioned for optimistation.

The reason for not using inputlookup is its the requirement to have source path instead of lookup.

About te query and its Goal:
1 . it will search all the three csv mentioned via source path to which would give Usage , Date and Office,
2. then 1 append would filter on the "Office and Type" to get NB_RTU field value
search Office=abc AND Type="Test-new"
| fields Date NB_RTU
3. next Append which is used for forecasting had 2 sub append in which $val and forecast (sum )is returned with respect to date.
And at end we would be displaying 4 fields in dashboard--_time , Forecast, NB_RTU and Usage

please let me know if i can give more clarification on any part of query

richgalloway · ‎12-04-2018

Here's a quick cut at trimming some fat to speed things up.

index=csvlookups source="F:\\SplunkMonitor\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\lookup_table_sip_pbx_forecasts.csv" abc
 | eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
 | rename CLLI as Office
 | stats sum(Usage) as Usage by Office, Date
 | append
     [ search index=csvlookups Office=abc AND Type="Test-new"
     | eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
     | fields Date NB_RTU
     | fields - _raw _time ]
 | sort Date
 | fillnull value=abc Office
 | filldown Usage
 | filldown NB_RTU
 | fillnull value=0 Usage
 | eval _time = strptime(Date, "%Y-%m-%d")
 | eval latest_time = now()
 | where ((_time >= relative_time(now(), "-3y@h")) AND (_time <= latest_time))
 | fields - latest_time Date
 | append
     [ gentimes start=-1
     | eval Date=strftime(mvrange(now(),now()+60*60*24*365*3,"1mon"),"%F")
     | mvexpand Date
     | fields Date
     | append
         [ search index=csvlookups Customer = "Regular Usage" AND Switch = "$Office$"
         | rename "Number of Channels" as val
         | return $val ]
     | where Date != ""
     | reverse
     | filldown search
     | rename search as Usage
     | reverse
     | append
         [ search index=csvlookups Customer != "Regular Usage" AND Switch = "$Office$"
         | rename "Expected Date of Addition" as edate
         | eval Date=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
         | rename "Number of Channels" as Forecast
         | stats sum(Forecast) as Forecast by Date]
     | sort Date
     | rename Switch as Office
     | eval Forecast1 = if(isnull(Forecast),Usage,Forecast)
     | fields - Usage Forecast
     | streamstats sum(Forecast1) as Forecast
     | fields - Forecast1
     | eval Date=strptime(Date, "%Y-%m-%d")
     | eval Date=if(Date < now(), now(), Date) ]
 | filldown Usage
 | filldown Office
 | eval Forecast = Forecast + Usage
 | eval Usage = if(Forecast >= 0,NULL,Usage)
 | eval _time=if(isnull(_time), Date, _time)
 | timechart limit=0 span=1w max(Usage) as Usage, max(NB_RTU) as NB_RTU, max(Forecast) as Forecast by Office
 | rename "NB_RTU: $Office$" as "RTU's Purchased", "Usage: $Office$" as "Usage", "Forecast: $Office$" as "Forecast"
 | filldown "RTU's Purchased"

---
If this reply helps you, Karma would be appreciated.

vikashperiwal · ‎12-05-2018

thank You for that attempt , but with the above query i am not able to get the output... i was planning to use "summary indexing" and i expect this can be fast enough

Please suggest if this cold be ideal approach

richgalloway · ‎12-05-2018

Summary indexing should help with performance.

Apologies for the lack of output. Without data, I was not able to test the query.

---
If this reply helps you, Karma would be appreciated.

How do we optimize the search result time with a query using the append command?

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)