Splunk Enterprise Security

Why am I not getting the correct percentage difference?

Lye
Path Finder

Hello

Kindly assist me in this query/solution.

I have a long list of IPs that logged in. Out of this list, I want to know the percentage of only 5 IPs. When I use this query

---My base query----

 

 

 

 

| search NOT IPs IN ("IP.A", "IP.B", "IP.C", "IP.D", "IP.E")
| stats count by IP
| eventstats sum(count) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc

 

 

 

 



It gives me a table like this

IP Count Percentage
IP.A 52 37
IP.B 35 26
IP.C 22 18
IP.D 44 17
IP.E 11 2


The Total percentage =100%.

But when I use this query

---My base query---- ip=*

 

 

 

 

| stats count by IP
| eventstats sum(count) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc

 

 

 

 




I get about 5 pages of all the list of the IPs and their respective percentages including the IP.A to IP.E in a table which all together totals 100% but the percentages of IP.A to IP.E changes completely. The 5 IPs shouldn't give me a 100%. It should a percentage fraction of the whole.

Please help.

Labels (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

The simplest way to get total is addtotals.

| addtotals col=t row=f fieldlabel=ip count

This will add a row labeled as "Total" (under ip column). 

View solution in original post

Tags (1)

yuanliu
SplunkTrust
SplunkTrust

Perhaps you can explain the first search first.  I trust that IPs is a typo for "IP".  But is NOT also a typo?  Suppose I correct the first typo, the first line becomes 

| search NOT IP IN ("IP.A", "IP.B", "IP.C", "IP.D", "IP.E")

How is it possible that the output contain five rows of those those five exact values the line is expected to exclude?

If, on the other hand, that IPs is not a typo, that explains why the second search gives you exactly those five IP values, because your data perhaps do not contain a field named IPs therefore NOT IPs IN ("IP.A", "IP.B", "IP.C", "IP.D", "IP.E") is exactly the same as IP=*.

Side note: If you want to troubleshoot code, your code illustration should not flip-flop between "IP" and "ip".

0 Karma

Lye
Path Finder

@yuanliu ,

Yes it was an error on my part and I sincerely apologize for that. Thanks for letting me know. Let me rewrite it

| Mybase query
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")
| stats count by ip
| eventstats sum(sum) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc

It gives me a table like this (these are rough values but this is the way the table looks.)

ipCountPercentage
ip.a5237.11
ip.b3526.12
ip.c2218.13
ip.d4415.16
ip.e113.48


The Total percentage =100%.

But when I use this query

| My base query ip=*
| stats count by ip
| eventstats sum(sum) as perc
| eval percentage= round(count*100/perc,2)
| fields - perc

I get about 5 pages of all the list of the ip addresses and their respective percentages including the ip.a to ip.e in a table which all together totals 100% but the percentages of ip.a to ip.e changes completely. The 5 ip addresses shouldn't give me a 100%. Right? It should a percentage fraction of the whole.

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

As @PickleRick said, your first search performs stats AFTER limiting events to only those containing the five ip values.

Based on your wording, I am speculating that you intention is to calculate percentage of those five values in total events, and only calculate/display those five percentages.  If this is the case, you can follow PickleRick's suggestion to change the search to

| Mybase query
| stats count by ip
| eventstats sum(count) as perc
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")
| eval percentage= round(count*100/perc,2)
| fields - perc

 

Lye
Path Finder

@yuanliu , Yes your speculation is 100% spot on. Exactly what I needed.  It worked. Thank you so much.

The other challenge I had was that I had to use a calculator to calculate the total percentage of the ips as I wasn't sure the extra query to use. Example if ip.a-ip.e were 10.01, 2.05, 16.82, 11.97. 11.11

I manually calculated it to get a total sum of 59.96. I don't know if there could be another line of query to add to give that 59.96 instead of calculating it manually.

However, I am super excited it worked and gave me the result I anticipated.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

The simplest way to get total is addtotals.

| addtotals col=t row=f fieldlabel=ip count

This will add a row labeled as "Total" (under ip column). 

Tags (1)

Lye
Path Finder

Thank you so much. The feedback from @yuanliu  regarding getting the total was very helpful, and so was the link.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No. Splunk is doing exactly what you told it to.

Remeber that after each pipe sign, splunk only knows the results of the directly preceeding command. It doesn't know anything about the search as a whole or anything like that.

So after you do

| search ip IN (IPA, IPB, IPC... whatever)

Even if your base search returned several thousand events, at this point you're limiting your result set to only those containing the specified ip. So if only 200 of those thousands contain those ips, you're left with only those 200 events. Therefore next step doing stats count by ip will operate only on those 200 ips, not an the whole initial result set.

So you might want to refactor your search to calculate the sum earlier when you still have all your events.

Lye
Path Finder

@PickleRick , Thank you so much for your detailed explanation. I understood it differently before now but your explanation made a huge difference. 

0 Karma

johnhuang
Motivator

Either of these query should work:

 

<base_search>
| top 0 ip
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")

 

 

<base_search>
| stats count as event_count by ip
| eventstats sum(event_count) AS total
| eval percent=ROUND(event_count/total*100, 2)
| search ip IN ("ip.a", "ip.b", "ip.c", ip.d, "ip.e")

 

0 Karma

Lye
Path Finder

@johnhuang , Thank you so much for your response.

I tried the top query. It did return events and tables but it gave me values more smaller in number. With the other query, I got no results in statistics but some values in the Events.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The "top" solution can return inconsistent results with many different ips (it returns only 50k results unless you reconfigure limits.conf).

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...