Solved: Re: How do I make my search command to summarize n...

Mr_Perkins · ‎09-07-2017

Aplogies, I'm not a Splunk administrator, I'm a capacity tool person that needs to extract some metrics from Splunk.
Mostly I'm doing fine, but this one has me stumped. I'm trying to collect network throughput data from F5 firewalls.
This is my search query:
| tstats
first(all.clientside_bytes_in)
from datamodel="bigip-tmstats-virtual_server_stat"
by
host
all.name
_time
span=5m
| rename first(all.) as * all. as *
| abs_to_rate("host name", "clientside_bytes_in")
| sort host,name,_time
| fields host,name,_time, clientside_bytes_in, clientside_bytes_in_rate

I get network throughput data at a 5 minute rate at a host,name level, and the data looks correct.
But I need to roll that up and have it just at 'host' level as host,name is too granular. I can't get it to work, when I take 'name' out of the query the results don't make any sense. How do return data at a host level, summing all of the name level data into one result per 5 minute?

Mr_Perkins · ‎09-13-2017

Here's my solution:
| tstats
earliest(all.clientside_bytes_in) as start_b_in
latest(all.clientside_bytes_in) as end_b_in
earliest(all.clientside_bytes_out) as start_b_out
latest(all.clientside_bytes_out) as end_b_out
from datamodel="bigip-tmstats-virtual_server_stat"
by
host
all.name
_time
span=300
| eval delta_b_in=if(end_b_in>start_b_in,end_b_in-start_b_in,0), b_in_sec=delta_b_in/300
| eval delta_b_out=if(end_b_out>start_b_out, end_b_out-start_b_out,0),b_out_sec=delta_b_out/300
| stats sum(b_in_sec) as bytes_in_sec, sum(b_out_sec) as bytes_out_sec
by _time,host

There are about 1,000 'all.names' per host, each with a cumulative clientside_bytes_in and clientside_bytes_out.
So for each 5 minute span, I'm taking the earliest and latest and producing a delta difference between the two, and then converting that to a per second rate (dividing it by 300).
On examining the data I frequently see the 'end' number lower than the 'begin' number, leading to a -ve delta. I am replacing these -ve results with a 0 (I'm assuming that the cumulative numbers are being reset for some reason or are wrapping round).

The capacity tool that the data is intended for wants data at the ‘host’ level, so I’m summing the individual per second rates to give an overall rate for the host.

I'm sure that I'm overcomplicating this. Any comments / suggestions?

View solution in original post

Mr_Perkins · ‎09-13-2017

Here's my solution:
| tstats
earliest(all.clientside_bytes_in) as start_b_in
latest(all.clientside_bytes_in) as end_b_in
earliest(all.clientside_bytes_out) as start_b_out
latest(all.clientside_bytes_out) as end_b_out
from datamodel="bigip-tmstats-virtual_server_stat"
by
host
all.name
_time
span=300
| eval delta_b_in=if(end_b_in>start_b_in,end_b_in-start_b_in,0), b_in_sec=delta_b_in/300
| eval delta_b_out=if(end_b_out>start_b_out, end_b_out-start_b_out,0),b_out_sec=delta_b_out/300
| stats sum(b_in_sec) as bytes_in_sec, sum(b_out_sec) as bytes_out_sec
by _time,host

There are about 1,000 'all.names' per host, each with a cumulative clientside_bytes_in and clientside_bytes_out.
So for each 5 minute span, I'm taking the earliest and latest and producing a delta difference between the two, and then converting that to a per second rate (dividing it by 300).
On examining the data I frequently see the 'end' number lower than the 'begin' number, leading to a -ve delta. I am replacing these -ve results with a 0 (I'm assuming that the cumulative numbers are being reset for some reason or are wrapping round).

The capacity tool that the data is intended for wants data at the ‘host’ level, so I’m summing the individual per second rates to give an overall rate for the host.

I'm sure that I'm overcomplicating this. Any comments / suggestions?

DalJeanis · ‎09-13-2017

@Mr_Perkins - if you mark your code with the code button (101 010) or indent it four spaces it will stay formatted for you, and not delete html-like parts of the code.

I'd tend to do it this way...

  | tstats 
        earliest(all.clientside_bytes_in) as cum_bytes_in
        earliest(all.clientside_bytes_out) as cum_bytes_out
        from datamodel="bigip-tmstats-virtual_server_stat" 
        by  host, all.name, _time span=5m

  | rename COMMENT as "redundant sort in case I decide to do something else, then copy prior record cum to current record as prior"
  | sort 0 host all.name  _time
  | streamstats current=f last(cum_bytes_in) as prior_bytes_in last(cum_bytes_out) as prior_bytes_out by host all.name 

  | rename COMMENT as "calculate 5m delta, leave null if wrap, then sum to host level and calculate per-second rates"
  | eval bytes_in=case(prior_bytes_in<=cum_bytes_in,cum_bytes_in - prior_bytes_in)
  | eval bytes_out=case(prior_bytes_out<=cum_bytes_out,cum_bytes_out - prior_bytes_out)
  | stats sum(bytes_in) as bytes_in, sum(bytes_out) as bytes_out by host _time
  | eval bytes_in_rate = bytes_in/300
  | eval bytes_out_rate = bytes_out/300

  | rename COMMENT as "present data"
  | sort host,_time
  | table host, _time, cum_bytes_in, bytes_in, bytes_in_rate,cum_bytes_out, bytes_out, bytes_out_rate

Mr_Perkins · ‎09-25-2017

I've had this running for a few days now and the results look great. Thanks.

DalJeanis · ‎09-07-2017

Here's how I interpret what your current query is doing:

From a certain datamodel, for each host and all.name, collect the first clientside_bytes_in value that is found in each 5 minute increment.
Simplify the names of a couple of fields.
Use a macro to convert the number of bytes to a rate
Sort the results and present them.

I don't think that's what you are wanting to do. You probably don't want just the first record in each time bucket, unless you are trying for the average rate and using a sampling technique. For the real average rate, you need to sum all the bytes and then divide by the length of time for the bucket (300 seconds). Thus, in the tstats, replace first() with sum().

Since you don't care about all.name, then just drop it from the initial tstats. In addition, you can do the rename step at the same time as the sum().

 | tstats 
       sum(all.clientside_bytes_in) as bytes_in
       from datamodel="bigip-tmstats-virtual_server_stat" 
       by  host, _time span=5m
 | eval bytes_in_rate = bytes_in/300
 | sort host,_time
 | fields host, _time, bytes_in, bytes_in_rate

The output rate here will be in average bytes per second across the 5m period, assuming that clientside_bytes_in was in bytes. You probably want to scale that to Meg by dividing by 104576 or Gig by 1073741824. You can also scale to a longer time range if you prefer.

Mr_Perkins · ‎09-07-2017

Thanks for your answer. I really should've included some data to make it more clear. The reason for my use of first() is that 'clientside_bytes_in' is an incremental number rather than the number of bytes during that interval. So it looks a bit like this (I'm not at work now, so I can't post real data)
_time clientside_bytes_in
10:00 1000
10:01 1050
10:02 1100
10:03 1150
10:04 1200
10:05 1250
10:06 1300

So I'm using first() to return that sample at 10:00 (1000 bytes) and then the next at 10:05 (1250 bytes). Then using abs_to_rate to convert that into the difference (250 bytes in that 5 minute span). If I use sum() instead I would get (1000+1050+1100+1150+1200)/300 which is 18.333, not 250. I'll add some real data tomorrow if it's not clear what the issue is.

How do I make my search command to summarize network throughput data?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes