Hello
Kindly assist me in this query/solution.
I have a long list of IPs that logged in. Out of this list, I want to know the percentage of only 5 IPs. When I use this query
---My base query----
| search NOT IPs IN ("IP.A", "IP.B", "IP.C", "IP.D", "IP.E")
| stats count by IP
| eventstats sum(count) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc
It gives me a table like this
IP | Count | Percentage |
IP.A | 52 | 37 |
IP.B | 35 | 26 |
IP.C | 22 | 18 |
IP.D | 44 | 17 |
IP.E | 11 | 2 |
The Total percentage =100%.
But when I use this query
---My base query---- ip=*
| stats count by IP
| eventstats sum(count) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc
I get about 5 pages of all the list of the IPs and their respective percentages including the IP.A to IP.E in a table which all together totals 100% but the percentages of IP.A to IP.E changes completely. The 5 IPs shouldn't give me a 100%. It should a percentage fraction of the whole.
Please help.
The simplest way to get total is addtotals.
| addtotals col=t row=f fieldlabel=ip count
This will add a row labeled as "Total" (under ip column).
Perhaps you can explain the first search first. I trust that IPs is a typo for "IP". But is NOT also a typo? Suppose I correct the first typo, the first line becomes
| search NOT IP IN ("IP.A", "IP.B", "IP.C", "IP.D", "IP.E")
How is it possible that the output contain five rows of those those five exact values the line is expected to exclude?
If, on the other hand, that IPs is not a typo, that explains why the second search gives you exactly those five IP values, because your data perhaps do not contain a field named IPs therefore NOT IPs IN ("IP.A", "IP.B", "IP.C", "IP.D", "IP.E") is exactly the same as IP=*.
Side note: If you want to troubleshoot code, your code illustration should not flip-flop between "IP" and "ip".
@yuanliu ,
Yes it was an error on my part and I sincerely apologize for that. Thanks for letting me know. Let me rewrite it
| Mybase query
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")
| stats count by ip
| eventstats sum(sum) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc
It gives me a table like this (these are rough values but this is the way the table looks.)
ip | Count | Percentage |
ip.a | 52 | 37.11 |
ip.b | 35 | 26.12 |
ip.c | 22 | 18.13 |
ip.d | 44 | 15.16 |
ip.e | 11 | 3.48 |
The Total percentage =100%.
But when I use this query
| My base query ip=*
| stats count by ip
| eventstats sum(sum) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc
I get about 5 pages of all the list of the ip addresses and their respective percentages including the ip.a to ip.e in a table which all together totals 100% but the percentages of ip.a to ip.e changes completely. The 5 ip addresses shouldn't give me a 100%. Right? It should a percentage fraction of the whole.
As @PickleRick said, your first search performs stats AFTER limiting events to only those containing the five ip values.
Based on your wording, I am speculating that you intention is to calculate percentage of those five values in total events, and only calculate/display those five percentages. If this is the case, you can follow PickleRick's suggestion to change the search to
| Mybase query
| stats count by ip
| eventstats sum(count) as perc
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")
| eval percentage= round(count*100/perc,2)
| fields - perc
@yuanliu , Yes your speculation is 100% spot on. Exactly what I needed. It worked. Thank you so much.
The other challenge I had was that I had to use a calculator to calculate the total percentage of the ips as I wasn't sure the extra query to use. Example if ip.a-ip.e were 10.01, 2.05, 16.82, 11.97. 11.11
I manually calculated it to get a total sum of 59.96. I don't know if there could be another line of query to add to give that 59.96 instead of calculating it manually.
However, I am super excited it worked and gave me the result I anticipated.
Thank you so much. The feedback from @yuanliu regarding getting the total was very helpful, and so was the link.
No. Splunk is doing exactly what you told it to.
Remeber that after each pipe sign, splunk only knows the results of the directly preceeding command. It doesn't know anything about the search as a whole or anything like that.
So after you do
| search ip IN (IPA, IPB, IPC... whatever)
Even if your base search returned several thousand events, at this point you're limiting your result set to only those containing the specified ip. So if only 200 of those thousands contain those ips, you're left with only those 200 events. Therefore next step doing stats count by ip will operate only on those 200 ips, not an the whole initial result set.
So you might want to refactor your search to calculate the sum earlier when you still have all your events.
@PickleRick , Thank you so much for your detailed explanation. I understood it differently before now but your explanation made a huge difference.
Either of these query should work:
<base_search>
| top 0 ip
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")
<base_search>
| stats count as event_count by ip
| eventstats sum(event_count) AS total
| eval percent=ROUND(event_count/total*100, 2)
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")
@johnhuang , Thank you so much for your response.
I tried the top query. It did return events and tables but it gave me values more smaller in number. With the other query, I got no results in statistics but some values in the Events.
The "top" solution can return inconsistent results with many different ips (it returns only 50k results unless you reconfigure limits.conf).