Splunk Cloud Platform

Why is base search more expensive?

DG
Explorer

Dear Community,

I would like to get some assistance and/or clarification regarding Splunk’s base-search/post-processing functionality. I have read it/heard that using one base-search and post processing instead of several similar queries is cost effective, we can save SVCs (splunk virtual computes) with it. In practice, unfortunately I have experienced quite the opposite:

Let’s say, I have a dashboard (call it “A”) with these queries:

 

 

 

index="myIndex" "[OPS] [INFO] event=\"asd\"" | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | stats dc(user_id) as "Unique users, who has logged ..."
index="myIndex" "[OPS] [INFO] event=\"asd\"" | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | timechart count by result
index="myIndex" "[OPS] [INFO] event=\"asd\"" | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | dedup user_id | timechart span=1h count as "per hour"| streamstats sum("per hour") as "total"
index="myIndex" "[OPS] [INFO] event=\"asd\"" | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | timechart dc(user_id) as "Unique users"
index="myIndex" "[OPS] [INFO] event=\"asd\"" | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Failed" AND reason != "bbb" | timechart count by reason

 

 

 

I cloned this “A” dashboard (let’s call the clone “B”).

I got some issues, like I got no data, or the numbers were different on “B” than “A”, but after some googling, reading Splunk community, I managed to get the same results on “B” with:

A base search:

 

 

 

index="myIndex" "[OPS] [INFO] event=\"asd\"" | stats count by user_id is_aaaaa_login environment result reason _time

 

 

 

Post-processes:

 

 

 

search | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | stats dc(user_id) as "Unique users, who has logged ..."
search | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | timechart count by result
search | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | dedup user_id | timechart span=1h count as "per hour"| streamstats sum("per hour") as "total"
search | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Successful" | timechart dc(user_id) as "Unique users"
search | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod" AND result="Failed" AND reason != "bbb" | timechart count by reason

 

 

 

I have added ‘refresh=”180”’ to the top of these two dashboards and leave them open in my browser for about one hour (and the common date-picker was set to “last 24 hours”). After this, I was surprised when I saw that dashboard “A” in “Splunk App for Chargeback” consumed around 5 SVCs while dashboard “B” used around 15 SVCs. So the dashboard with the base-search was way more expensive than the “normal” one. I thought that it will be much cheaper.

Why is that? Did I construct my base/post-process queries badly? If yes, what should I change?

I searched a lot, I found only one comment on Splunk community here:

https://community.splunk.com/t5/Dashboards-Visualizations/Base-Search-for-dashboard-optimization/m-p...

“However, I do not recommend it when dealing with large data because base search is slow.” which implies that maybe base search is not always a cheaper solution?! So I executed only my base-search in Splunk for a 24 hours interval, it gave back a table with around 3,000,000 rows. Does this mean a large data set? Should I forget using base-searches?

Thank you very much for your help!

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Your base search is not really reducing the data set, as it is aggregating by the 6 fields, including _time, so it's likely the base search result set will be the entire data set, so it's quite possible that this is not a good use case.

Of your 5 post process searches, you have 1 stats, 3 'unspanned' timecharts and 1 spanned (1h) timechart.

2 of those timecharts are simple ones

  • count by reason - nb reason is redundant as it's always Successful)
  • dc(user_id)

As ITWhisperer says, the additional filters (user_id, is_aaaaa_login, environment) should also be part of the base search. Is there a reason why not and how many of the 3million events are included unnecessarily?

You may be better off having more than one search, e.g. one for the timechart where you're counting events and unique users. Note that you can also include the user_id VALUES so that you can then do the subsequent stats dc(user_id). That means a single base search can handle results for 3 post process searches and will not require a big data set

<search id="base_tc">
  <query>
index="myIndex" "[OPS] [INFO] event=\"asd\"" user_id != "0" is_aaaaa_login="true" environment="prod" result="Successful" 
| timechart count dc(user_id) as users values(user_id) as user_ids
  </query>
</search>

Note that you can include the filters from the 'where' clause as part of the original search

The two timecharts would then look like this

<search base="base">
  <query>
| fields _time count
  </query>
</search>

<search base="base">
  <query>
| fields _time users
  </query>
</search>

and the stats one would be

<search base="base">
  <query>
| stats dc(user_ids) as user_ids
  </query>
</search>

You are then left with the 1h timechart and the failed results timechart, which could be their own search.

 

 

View solution in original post

DG
Explorer

Thank you very much! Both solution (loadjob from ITWhisperer and base search from bowesmana) worked, saved SVCs for us, we have to measure a few times to get a more accurate picture of exactly how much, but once it was 75% saving, other time it was around 40% saving. I'm quite new here, can I accept both as solution?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Great that you got some good savings - as for accepting two solutions, you can only accept one, so choose wisely 😀😀

0 Karma

DG
Explorer

Your answer is more detailed and I got more explanations, so I accepted yours. 🙂 

However, it seems that on average we gain more with the loadjob solution. I don't know why the SVC consumptions are so different, I'm running the default dashboard and these two solutions with the same "refresh=180" attribute for one hour.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Interesting - I've not used the idea of a pseudo base search and then using loadjob to post process. 

I guess that using a single base search split in the way you have it and then using loadjob, is then still just having the single search, hence the improved savings.

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Without having done too much investigation, the way I have found base searches sometimes working is more like a shorthand for the first part of the search, by that I mean that when the post-processing search needs to execute, it executes the base search then the post processing.

If your dashboard uses the same start to the search in a number of places, rather than writing multiple copies of the search, you can write once and use multiple times; it is a bit like extending a class in object-oriented paradigm.

Now, this may be down to the searches I have been using and whether they can be executed on indexers or have to come back to the search head(?) - as I said, I haven't investigated the detail of this, and don't have a detailed understanding of how this all works.

Once I had found out how to use loadjob to gain performance, I didn't bother investigating further.

There is a caveat to this approach. The results from the base query do not hang around for ever so the sid may become stale (and not return any results) in which case the base search is executed again (generating a new sid).

In some circumstances, the way to get around this is to use saved reports. This assumes that you have a saved report that covers the time period you need for your dashboard. For example, I have a number of dashboards which are based on the past week or month. These results don't change during the day, so I can run a report in the early hours and then load these results throughout the day without having to search the whole month every time the dashboard loads. This is a tremendous boost to dashboard performance, although not applicable in every case. (Perhaps I should consider finding or doing a BSides presentation on this, as I don't think this short piece has done the topic justice 😀)

DG
Explorer

I got it! Thank you very much!! 🙂

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Your base search is not really reducing the data set, as it is aggregating by the 6 fields, including _time, so it's likely the base search result set will be the entire data set, so it's quite possible that this is not a good use case.

Of your 5 post process searches, you have 1 stats, 3 'unspanned' timecharts and 1 spanned (1h) timechart.

2 of those timecharts are simple ones

  • count by reason - nb reason is redundant as it's always Successful)
  • dc(user_id)

As ITWhisperer says, the additional filters (user_id, is_aaaaa_login, environment) should also be part of the base search. Is there a reason why not and how many of the 3million events are included unnecessarily?

You may be better off having more than one search, e.g. one for the timechart where you're counting events and unique users. Note that you can also include the user_id VALUES so that you can then do the subsequent stats dc(user_id). That means a single base search can handle results for 3 post process searches and will not require a big data set

<search id="base_tc">
  <query>
index="myIndex" "[OPS] [INFO] event=\"asd\"" user_id != "0" is_aaaaa_login="true" environment="prod" result="Successful" 
| timechart count dc(user_id) as users values(user_id) as user_ids
  </query>
</search>

Note that you can include the filters from the 'where' clause as part of the original search

The two timecharts would then look like this

<search base="base">
  <query>
| fields _time count
  </query>
</search>

<search base="base">
  <query>
| fields _time users
  </query>
</search>

and the stats one would be

<search base="base">
  <query>
| stats dc(user_ids) as user_ids
  </query>
</search>

You are then left with the 1h timechart and the failed results timechart, which could be their own search.

 

 

DG
Explorer

"including _time, so it's likely the base search result set will be the entire data set" -> yes, now I think so, too.

I have included _time, because my post-processing searches displayed error or invalid data, so I googled and read it i.e. here:

https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-create-Time-chart-with-search-with-b...

that I should try using  "| fields *", or "stats count by _time" here:

https://community.splunk.com/t5/Splunk-Search/help-on-base-search-event-limit/m-p/574058#M200053

"As ITWhisperer says, the additional filters (user_id, is_aaaaa_login, environment) should also be part of the base search. Is there a reason why not and how many of the 3million events are included unnecessarily?"

-> Totally true, my mistake, I did not realize that they can be the part of the base search.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The common part of your searches appears to be this

index="myIndex" "[OPS] [INFO] event=\"asd\"" | where user_id != "0" AND is_aaaaa_login="true" AND environment="prod"

You could try creating a "base" search with this and then in the done handler save the job sid in a token

Then in subsequent searches, you use loadjob to load the result set of the base search and apply further filtering (result = x or y) and your stats calculations.

Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...