Solved: Re: Discrepany in total count

chinkeeparco · ‎08-03-2019

Hello guys,

I have the following syntax and data:

However, there is a discrepancy with the total count per category.

For example,
Password Reset should be 3106 (when I manually count it) but in the screenshot provided as you can see, the total count only has 3007 😞 am i missing something?

in the second screenshot, as you can see, it displays the correct count; however, it has duplicated value which I don't know how to resolve.

Sample data:

short_description, category, cluster_count

need help password reset, Password reset , 50
Internet access, Connection issue, 10
XXXX Installation, Installation request, 60
Reset my password, Password reset , 55

Thank you in advance.

niketn · ‎08-03-2019

@chinkeeparco as per your query you have multiple rows in your csv file for the same combination of category and cluster_count. Which means for the first query duplicates are not being considered because you have performed stats count by category, cluster_count in the first query and used the same for subsequent query.

In your second query you are removing duplicates by stats count by category, cluster_count but then you get the duplicate count back using cluster_count*count as total.

In essence yes there will be discrepancy in your query but which one is correct depends on your use case. If you need duplicate count remove stats count by category, cluster_count. If you dont need duplicates then your query is showing the correct results.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

woodcock · ‎08-04-2019

I suspect that when the cluster_count value is 1 then there is no value for cluster_count (so the 1 is implied). If so, then the correct answer should be given by this search:

index=<You should always specify an index> AND source="sap_cluster.csv"
| eval cluster_count = coalesce(cluster_count, 1)
| rename COMMENT AS "At this point, both of your solutions should give the same answer"
| stats sum(cluster_count) AS Total BY category
| eventstats sum(Count) AS Total
| eval perc = round((Count / Total) * 100)
| sort 0 - Count
| table category Count perc

chinkeeparco · ‎08-04-2019

@woodcock hello!! thank you! this work as well!!

niketn · ‎08-03-2019

@chinkeeparco as per your query you have multiple rows in your csv file for the same combination of category and cluster_count. Which means for the first query duplicates are not being considered because you have performed stats count by category, cluster_count in the first query and used the same for subsequent query.

In your second query you are removing duplicates by stats count by category, cluster_count but then you get the duplicate count back using cluster_count*count as total.

In essence yes there will be discrepancy in your query but which one is correct depends on your use case. If you need duplicate count remove stats count by category, cluster_count. If you dont need duplicates then your query is showing the correct results.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

chinkeeparco · ‎08-03-2019

@niketnilay thank you so much for that!! It resolved my issue! I don't know how I can thank you enough.

niketn · ‎08-03-2019

@chinkeeparco I am glad the explanation worked for you to get the issue resolved. I have converted my comment to answer. Please accept the answers to mark this question as answered.

Well do think about Splunk Answers community when you run into issues. As far as thanking is concerned do actively participate on Splunk Answers and help others facing issues you have already resolved.

Happy Splunking!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

chinkeeparco · ‎08-04-2019

I will! thanks 🙂

diogofgm · ‎08-03-2019

can you provide an example of the original csv?

------------
Hope I was able to help you. If so, some karma would be appreciated.

chinkeeparco · ‎08-03-2019

@diogofgm hello, I edit the orginal post and added the sample data 🙂

renjith_nair · ‎08-03-2019

@chinkeeparco,
What's the significance of cluster_count here? If cluster_count is not required in your final result , just use stats count by category|eventstats sum(count) as Total

---
What goes around comes around. If it helps, hit it with Karma 🙂

chinkeeparco · ‎08-03-2019

@renjith.nair hello! thank you for your answer. However, i need the cluster_count. I edited the original post for the sample data 🙂

Discrepany in total count

Announcing the Expansion of the Splunk Academic Alliance Program

Learn Splunk Insider Insights, Do More With Gen AI, & Find 20+ New Use Cases You Can ...

Buttercup Games: Further Dashboarding Techniques (Part 7)