Mismatch in Splunk dashboard count VS search count

kumva01

Hi All,

I am using the base search and post-process searches outlined below, along with additional post-process searches in my Splunk dashboard. The index name and fields are consistent across all the panels. I have explicitly included a fields command to specify the list of fields required for the post-process searches.

However, I am observing a discrepancy: the result count in the Splunk search is higher than the result count displayed on the Splunk dashboard. Could you help me understand why this is happening ?

base search:-
index=myindex TERM(keyword) fieldname1="EXIT" | bin _time span=1d
| fields _time, httpStatusCde, statusCde, respTime, EId

Post process search1:-
| search EId="5eb2aee9"
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time
| eval "FailureRate"= round((failures/Total)*100,2)
| table _time, Total, FailureRate, p95RespTime
| sort -_time

Post process search2:-
| search EId="5eb2aee8"
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time
| eval "FailureRate"= round((failures/Total)*100,2)
| table _time, Total, FailureRate, p95RespTime
| sort -_time

isoutamo

Hi

How many events base search is returning and how long it takes to finish? There are limits for those. Quite probably you have hit by those?

When I look your base and post search you could modify your base search to include stats there which is the recommended way to use it.

index=myindex TERM(keyword) fieldname1="EXIT" 
| bin _time span=1d
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time EId

Then both post searches something like this

| search EId="5eb2aee9"
| stats count as Total, count(failures) as failures, first(p95RespTime) as p95RespTime by _time
| eval "FailureRate"= round((failures/Total)*100,2)
| table _time, Total, FailureRate, p95RespTime
| sort -_time

r. Ismo

kumva01

@isoutamo : The base search returns 66,449,351 events for the last 1day (earliest=-1d@d and latest=now) and completes in 37.51 seconds. We are using Splunk Cloud in our environment, what are the limit count numbers a base search can process ? Could you please share this.

I will try modifying my search as per your suggestion and update.

isoutamo

Those seems to be same as on prem 500,000 events and 30s (I think that this was earlier 60s, but seems to be same in on-prem too). See https://docs.splunk.com/Documentation/SplunkCloud/latest/Viz/Savedsearches#Use_a_transforming_base_s...

Based on those you have exceeded both limits. I suppose that event limit is much more important and this could be the reason why it didn't work as expected.

ITWhisperer

Try changing your base search so that it ends with a tables command rather than fields command. Also, your Eid is different in your two post-processing searches.

kumva01

@ITWhisperer : Thanks for your reply.

The primary purpose of using a base search with post-processing searches is to minimize search runtime and ensure the dashboard panels load quickly. While the fields command retains the necessary fields for post-processing, it is not producing accurate results in this case. Although replacing fields with the table command yields accurate results, it significantly increases resource usage and search completion time, negatively impacting dashboard performance.

Any specific reason why fields command is not giving accurate results?

Regards

VK

bowesmana

Aside from the limits for base search results, using a base search to hold large numbers will often NOT improve performance because you are taking lots of results from perhaps multiple indexers, where you are benefiting from parallelism, and sticking them on the search head, where you only have the CPU of the single search head to then process all those results - also competing for CPU with other users of that search head.

Note that the comments about doing this in the base search

...
| stats count as Total, count(eval(httpStatusCde!="200" OR statusCde!="0000")) as failures, exactperc95(respTime) as p95RespTime by _time EId

followed by a post process search doing

| search EId="5eb2aee9"
| stats count as Total, count(failures) as failures, first(p95RespTime) as p95RespTime by _time
...

is not quite right, as you don't need another stats, because you are just getting the information calculated in the base stats, but filtering out only the EId you want.

However, a point to note about stats + stats is that the second stats would not do stats COUNT, but stats sum(Total), i.e. if you wanted to get the total for EId without regard to _time, you could do something like this...

| search EId="5eb2aee9"
| stats sum(Total) as Total, sum(failures) as failures, min(p95RespTime) as min_p95RespTime max(p95RespTime) as max_p95RespTime avg(p95RespTime) as avg_p95RespTime 
...

ITWhisperer

This from the documentation

Best practices for creating chain searches

Use these best practices to make sure that chain searches work as expected.

Use a transforming base search

A base search should be a transforming search that returns results formatted as a statistics table. For example, searches using the following commands are transforming searches: stats, chart, timechart, and geostats, among others. For more information on transforming commands, see About transforming commands in the Search Manual.

https://docs.splunk.com/Documentation/SplunkCloud/latest/DashStudio/dsChain#Best_practices_for_creat...