Splunk Search

How to sum field values with a dependancy on another field?

Explorer

My data has a IP field and a number of bytes used by that field. I send data every 5 mins and most of the IPs remain same for top 10 consuming ones. For query - source="tcp:6565"| top IP by Bytes | sort -Bytes | fields IP Bytes | head 10 , my data is
(IP Bytes)
192.168.abc.abc 18670106

192.168.abc.abc 17764375

193.235.51.112 17241178

192.168.abc.abc 7814809

193.235.51.112 4436527

...
...
...

Now this was for lsast 60 mins, and I have repeating IPs
How can I sum the values so that it shows me the cumulative usage of the IP according to the time range?
Thanks!

0 Karma
1 Solution

Explorer

I got it. This was easier than I thought it would be.

source="tcp:6565"| stats sum(Bytes) AS Bytes by IP | sort -Bytes | head 10

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Okay, first, that query is not doing what you think it is doing.

| top IP by Bytes is going to give you the most common IPs for each individual value of Bytes. Generally, since each value of Bytes is going to be unique, that command is really not doing anything but sorting the records into ascending Bytes order, and possibly killing some tiny records where you happen to have more than 10 IPs that sent exactly the same small number of Bytes.

Then, you sort it by size and take the first 10, which means you are getting not the IPs with the most Bytes of traffic, but the IPs that have the 10 largest records of traffic in the time period. So, the consistency may be because those IPs have consistently big packets rather than a lot of packets. You also will get dups if the same IP sent two packets that were in the top 10.

So, first we need to get the actual 10 highest traffic IPs:

 source="tcp:6565"
| stats sum(Bytes) as Bytes by IP
| sort 10 - Bytes

Now, if you want to compare them over time, we're going to have to remember what time it is. Here's one way...

 source="tcp:6565"
| bin _time span=5m
| stats sum(Bytes) as Bytes by IP _time

| rename COMMENT as "Collapse the non-top-10 into Other."
| sort 0 _time - Bytes 
| streamstats count as recno by _time 
| eval IP=if(recno>10,"OTHER",IP)
| stats sum(Bytes) as Bytes by IP _time

| rename COMMENT as "Calculate percentage for each _time increment."
| eventstats sum(Bytes) as TimeBytes by _time 
| eval percent = round(100*Bytes/TimeBytes,2)

| rename COMMENT as "Calculate and add the totals by IP."
| appendpipe 
    [| eval temp=if(IP="OTHER",0,1) 
     | stats sum(Bytes) as Bytes, max(_time) as _time by temp IP 
     | sort 0 - Bytes 
     | streamstats count as recno by temp 
     | eval IP=if(recno>10,"OTHER",IP) 
     | stats sum(Bytes) as Bytes, max(_time) as _time by IP 
     | eventstats sum(Bytes) as TimeBytes 
     | eval percent = round(100*Bytes/TimeBytes,2)
     | eval _time=_time + 300
     ]

That's not perfect - really, I'd probably want to calculate the top IPs across the whole time period first, then show them during each time increment only if they are also in the top 10 for that time increment. Otherwise, you r final percentages will always be low for any IPs that ocasionally drop out of the top 10.

Explorer

Thanks! Thats really informative.

Explorer

I got it. This was easier than I thought it would be.

source="tcp:6565"| stats sum(Bytes) AS Bytes by IP | sort -Bytes | head 10

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

You got that part of it correct. Notice that you can combine the last two commands into one:

| sort 10 - Bytes
0 Karma