Splunk Search

Discrepany in total count

chinkeeparco
Explorer

Hello guys,

I have the following syntax and data:

alt text

However, there is a discrepancy with the total count per category.

For example,
Password Reset should be 3106 (when I manually count it) but in the screenshot provided as you can see, the total count only has 3007 😞 am i missing something?

alt text

in the second screenshot, as you can see, it displays the correct count; however, it has duplicated value which I don't know how to resolve.

Sample data:

short_description, category, cluster_count

need help password reset, Password reset , 50
Internet access, Connection issue, 10
XXXX Installation, Installation request, 60
Reset my password, Password reset , 55

Thank you in advance.

0 Karma
1 Solution

niketn
Legend

@chinkeeparco as per your query you have multiple rows in your csv file for the same combination of category and cluster_count. Which means for the first query duplicates are not being considered because you have performed stats count by category, cluster_count in the first query and used the same for subsequent query.

In your second query you are removing duplicates by stats count by category, cluster_count but then you get the duplicate count back using cluster_count*count as total.

In essence yes there will be discrepancy in your query but which one is correct depends on your use case. If you need duplicate count remove stats count by category, cluster_count. If you dont need duplicates then your query is showing the correct results.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

woodcock
Esteemed Legend

I suspect that when the cluster_count value is 1 then there is no value for cluster_count (so the 1 is implied). If so, then the correct answer should be given by this search:

index=<You should always specify an index> AND source="sap_cluster.csv"
| eval cluster_count = coalesce(cluster_count, 1)
| rename COMMENT AS "At this point, both of your solutions should give the same answer"
| stats sum(cluster_count) AS Total BY category
| eventstats sum(Count) AS Total
| eval perc = round((Count / Total) * 100)
| sort 0 - Count
| table category Count perc

chinkeeparco
Explorer

@woodcock hello!! thank you! this work as well!!

0 Karma

niketn
Legend

@chinkeeparco as per your query you have multiple rows in your csv file for the same combination of category and cluster_count. Which means for the first query duplicates are not being considered because you have performed stats count by category, cluster_count in the first query and used the same for subsequent query.

In your second query you are removing duplicates by stats count by category, cluster_count but then you get the duplicate count back using cluster_count*count as total.

In essence yes there will be discrepancy in your query but which one is correct depends on your use case. If you need duplicate count remove stats count by category, cluster_count. If you dont need duplicates then your query is showing the correct results.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

chinkeeparco
Explorer

@niketnilay thank you so much for that!! It resolved my issue! I don't know how I can thank you enough.

0 Karma

niketn
Legend

@chinkeeparco I am glad the explanation worked for you to get the issue resolved. I have converted my comment to answer. Please accept the answers to mark this question as answered.

Well do think about Splunk Answers community when you run into issues. As far as thanking is concerned do actively participate on Splunk Answers and help others facing issues you have already resolved.

Happy Splunking!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

chinkeeparco
Explorer

I will! thanks 🙂

0 Karma

diogofgm
SplunkTrust
SplunkTrust

can you provide an example of the original csv?

------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

chinkeeparco
Explorer

@diogofgm hello, I edit the orginal post and added the sample data 🙂

0 Karma

renjith_nair
Legend

@chinkeeparco,
What's the significance of cluster_count here? If cluster_count is not required in your final result , just use stats count by category|eventstats sum(count) as Total

---
What goes around comes around. If it helps, hit it with Karma 🙂
0 Karma

chinkeeparco
Explorer

@renjith.nair hello! thank you for your answer. However, i need the cluster_count. I edited the original post for the sample data 🙂

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...