Hi All,
I am using the base search and post-process searches outlined below, along with additional post-process searches in my Splunk dashboard. The index name and fields are consistent across all the panels. I have explicitly included a fields command to specify the list of fields required for the post-process searches.
However, I am observing a discrepancy: the result count in the Splunk search is higher than the result count displayed on the Splunk dashboard. Could you help me understand why this is happening ?
base search:-
index=myindex TERM(keyword) fieldname1="EXIT" | bin _time span=1d
| fields _time, httpStatusCde, statusCde, respTime, EId
Post process search1:-
| search EId="5eb2aee9"
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time
| eval "FailureRate"= round((failures/Total)*100,2)
| table _time, Total, FailureRate, p95RespTime
| sort -_time
Post process search2:-
| search EId="5eb2aee8"
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time
| eval "FailureRate"= round((failures/Total)*100,2)
| table _time, Total, FailureRate, p95RespTime
| sort -_time
Hi
How many events base search is returning and how long it takes to finish? There are limits for those. Quite probably you have hit by those?
When I look your base and post search you could modify your base search to include stats there which is the recommended way to use it.
index=myindex TERM(keyword) fieldname1="EXIT"
| bin _time span=1d
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time EId
Then both post searches something like this
| search EId="5eb2aee9"
| stats count as Total, count(failures) as failures, first(p95RespTime) as p95RespTime by _time
| eval "FailureRate"= round((failures/Total)*100,2)
| table _time, Total, FailureRate, p95RespTime
| sort -_time
r. Ismo
@isoutamo : The base search returns 66,449,351 events for the last 1day (earliest=-1d@d and latest=now) and completes in 37.51 seconds. We are using Splunk Cloud in our environment, what are the limit count numbers a base search can process ? Could you please share this.
I will try modifying my search as per your suggestion and update.
Those seems to be same as on prem 500,000 events and 30s (I think that this was earlier 60s, but seems to be same in on-prem too). See https://docs.splunk.com/Documentation/SplunkCloud/latest/Viz/Savedsearches#Use_a_transforming_base_s...
Based on those you have exceeded both limits. I suppose that event limit is much more important and this could be the reason why it didn't work as expected.
Try changing your base search so that it ends with a tables command rather than fields command. Also, your Eid is different in your two post-processing searches.
@ITWhisperer : Thanks for your reply.
The primary purpose of using a base search with post-processing searches is to minimize search runtime and ensure the dashboard panels load quickly. While the fields command retains the necessary fields for post-processing, it is not producing accurate results in this case. Although replacing fields with the table command yields accurate results, it significantly increases resource usage and search completion time, negatively impacting dashboard performance.
Any specific reason why fields command is not giving accurate results?
Regards
VK
Aside from the limits for base search results, using a base search to hold large numbers will often NOT improve performance because you are taking lots of results from perhaps multiple indexers, where you are benefiting from parallelism, and sticking them on the search head, where you only have the CPU of the single search head to then process all those results - also competing for CPU with other users of that search head.
Note that the comments about doing this in the base search
...
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time EId
followed by a post process search doing
| search EId="5eb2aee9"
| stats count as Total, count(failures) as failures, first(p95RespTime) as p95RespTime by _time
...
is not quite right, as you don't need another stats, because you are just getting the information calculated in the base stats, but filtering out only the EId you want.
However, a point to note about stats + stats is that the second stats would not do stats COUNT, but stats sum(Total), i.e. if you wanted to get the total for EId without regard to _time, you could do something like this...
| search EId="5eb2aee9"
| stats sum(Total) as Total, sum(failures) as failures, min(p95RespTime) as min_p95RespTime max(p95RespTime) as max_p95RespTime avg(p95RespTime) as avg_p95RespTime
...
This from the documentation
Use these best practices to make sure that chain searches work as expected.
A base search should be a transforming search that returns results formatted as a statistics table. For example, searches using the following commands are transforming searches: stats, chart, timechart, and geostats, among others. For more information on transforming commands, see About transforming commands in the Search Manual.