Hello guys,
I have the following syntax and data:
However, there is a discrepancy with the total count per category.
For example,
Password Reset should be 3106 (when I manually count it) but in the screenshot provided as you can see, the total count only has 3007 😞 am i missing something?
in the second screenshot, as you can see, it displays the correct count; however, it has duplicated value which I don't know how to resolve.
Sample data:
short_description, category, cluster_count
need help password reset, Password reset , 50
Internet access, Connection issue, 10
XXXX Installation, Installation request, 60
Reset my password, Password reset , 55
Thank you in advance.
@chinkeeparco as per your query you have multiple rows in your csv file for the same combination of category and cluster_count. Which means for the first query duplicates are not being considered because you have performed stats count by category, cluster_count
in the first query and used the same for subsequent query.
In your second query you are removing duplicates by stats count by category, cluster_count
but then you get the duplicate count back using cluster_count*count
as total.
In essence yes there will be discrepancy in your query but which one is correct depends on your use case. If you need duplicate count remove stats count by category, cluster_count
. If you dont need duplicates then your query is showing the correct results.
I suspect that when the cluster_count
value is 1
then there is no value for cluster_count
(so the 1
is implied). If so, then the correct answer should be given by this search:
index=<You should always specify an index> AND source="sap_cluster.csv"
| eval cluster_count = coalesce(cluster_count, 1)
| rename COMMENT AS "At this point, both of your solutions should give the same answer"
| stats sum(cluster_count) AS Total BY category
| eventstats sum(Count) AS Total
| eval perc = round((Count / Total) * 100)
| sort 0 - Count
| table category Count perc
@woodcock hello!! thank you! this work as well!!
@chinkeeparco as per your query you have multiple rows in your csv file for the same combination of category and cluster_count. Which means for the first query duplicates are not being considered because you have performed stats count by category, cluster_count
in the first query and used the same for subsequent query.
In your second query you are removing duplicates by stats count by category, cluster_count
but then you get the duplicate count back using cluster_count*count
as total.
In essence yes there will be discrepancy in your query but which one is correct depends on your use case. If you need duplicate count remove stats count by category, cluster_count
. If you dont need duplicates then your query is showing the correct results.
@niketnilay thank you so much for that!! It resolved my issue! I don't know how I can thank you enough.
@chinkeeparco I am glad the explanation worked for you to get the issue resolved. I have converted my comment to answer. Please accept the answers to mark this question as answered.
Well do think about Splunk Answers community when you run into issues. As far as thanking is concerned do actively participate on Splunk Answers and help others facing issues you have already resolved.
Happy Splunking!
I will! thanks 🙂
can you provide an example of the original csv?
@diogofgm hello, I edit the orginal post and added the sample data 🙂
@chinkeeparco,
What's the significance of cluster_count here? If cluster_count is not required in your final result , just use stats count by category|eventstats sum(count) as Total
@renjith.nair hello! thank you for your answer. However, i need the cluster_count. I edited the original post for the sample data 🙂