Splunk Search

Aggregate rate for entire cluster from individual hosts data

abhisawa
Explorer

I have cluster of more than 100 hosts which getting data over network from multiple source. I can calculate rate of incoming data by collecting 'RX Bytes' field from 'ifconfig' output every minute. So my splunk query to create timechart for single hosts , looks like

index=os source=interfaces eth0 host=hostname1 | sort  -_time | streamstats current=false last(RXbytes) as lastRX  | eval RX_Thruput_bytes = ((lastRX-RXbytes)/(1024*60)) | timechart span=10m avg(RX_Thruput_bytes)

How can I make addition of avg(RX_Thruput_bytes) for all 100 hosts and determine rate of incoming data for entire cluster ?

Tags (2)
0 Karma
1 Solution

abhisawa
Explorer

After multiple iteration and cross verifying results with actual ifconfig data , following query works correctly. Updated streamstats with by host to provide accurate calculation.

index=os source=interfaces eth0  | sort 0 - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime by host 
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/(1024*(lastTime-_time)))
| bucket span=4h _time  |stats avg(thruput_kb) as average_kb_per_host by host _time
| timechart span=4h sum(average_kb_per_host) as cluster_thruput_kb

Thank you martin for providing initial approach.

View solution in original post

abhisawa
Explorer

After multiple iteration and cross verifying results with actual ifconfig data , following query works correctly. Updated streamstats with by host to provide accurate calculation.

index=os source=interfaces eth0  | sort 0 - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime by host 
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/(1024*(lastTime-_time)))
| bucket span=4h _time  |stats avg(thruput_kb) as average_kb_per_host by host _time
| timechart span=4h sum(average_kb_per_host) as cluster_thruput_kb

Thank you martin for providing initial approach.

martin_mueller
SplunkTrust
SplunkTrust

Something like this?

  index=os source=interfaces eth0 | sort - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/1024*(lastTime-_time))
| timechart span=10m avg(thruput_kb) as average_kb_per_host dc(host) as hosts
| eval average_kb_per_cluster = average_kb_per_host * hosts | fields - average_kb_per_host hosts

Assuming every host reports every time, the dc() for every bucket will be the number of hosts in your cluster. Note, the total average is slightly dirty from a statistics point of view, if a single host has more or less number of reports in the ten-minute bucket his throughput will be weighted slightly more or less than that of other hosts. This might be more correct from a statistics point of view:

  index=os source=interfaces eth0 | sort - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/1024*(lastTime-_time))
| bucket span=10m _time | stats avg(thruput_kb) as average_kb_per_host by _time host
| timechart span=10m sum(thruput_kb) as cluster_thruput_kb

My brain isn't quire sure on what's more correct right now, so do try both and think about what works best.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Those differences are expected - every time you run the search the underlying data changes a little because the time range has progressed a little.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Are you running the search over a fixed time range (e.g. "Yesterday") or a relative time range (e.g. "Last 24 hours")?

0 Karma

abhisawa
Explorer

I am running on Last 24 hours and difference is very minor.

0 Karma

abhisawa
Explorer

Martin, Thank you for taking look at this query. Your 2nd query which I was looking for with modification as follows

  • For some reason stats average was getting zero for few of hosts so I changed stats avg(thruput_kb) as average_kb_per_host by _time host to stats avg(thruput_kb) as average_kb_per_host host _time, looks like fields order does matter.

  • I think in timechart span=10m sum(thruput_kb) as cluster_thruput_kb you meant sum(average_kb_per_host) .

So final query as following gives me believable output in chart BUT every single time I run this query gives me minor variation in timechart for 24 hour worth of data.

Is that expected ?

index=os source=interfaces eth0 | sort 0 - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/(1024*(lastTime-_time)))
| bucket span=1h _time |stats avg(thruput_kb) as average_kb_per_host by host _time
| timechart span=1h sum(average_kb_per_host) as cluster_thruput_kb

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...