I have cluster of more than 100 hosts which getting data over network from multiple source. I can calculate rate of incoming data by collecting 'RX Bytes' field from 'ifconfig' output every minute. So my splunk query to create timechart for single hosts , looks like
index=os source=interfaces eth0 host=hostname1 | sort -_time | streamstats current=false last(RXbytes) as lastRX | eval RX_Thruput_bytes = ((lastRX-RXbytes)/(1024*60)) | timechart span=10m avg(RX_Thruput_bytes)
How can I make addition of avg(RX_Thruput_bytes) for all 100 hosts and determine rate of incoming data for entire cluster ?
After multiple iteration and cross verifying results with actual ifconfig data , following query works correctly. Updated streamstats
with by host
to provide accurate calculation.
index=os source=interfaces eth0 | sort 0 - _time | streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime by host | eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/(1024*(lastTime-_time))) | bucket span=4h _time |stats avg(thruput_kb) as average_kb_per_host by host _time | timechart span=4h sum(average_kb_per_host) as cluster_thruput_kb
Thank you martin for providing initial approach.
After multiple iteration and cross verifying results with actual ifconfig data , following query works correctly. Updated streamstats
with by host
to provide accurate calculation.
index=os source=interfaces eth0 | sort 0 - _time | streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime by host | eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/(1024*(lastTime-_time))) | bucket span=4h _time |stats avg(thruput_kb) as average_kb_per_host by host _time | timechart span=4h sum(average_kb_per_host) as cluster_thruput_kb
Thank you martin for providing initial approach.
Something like this?
index=os source=interfaces eth0 | sort - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/1024*(lastTime-_time))
| timechart span=10m avg(thruput_kb) as average_kb_per_host dc(host) as hosts
| eval average_kb_per_cluster = average_kb_per_host * hosts | fields - average_kb_per_host hosts
Assuming every host reports every time, the dc() for every bucket will be the number of hosts in your cluster. Note, the total average is slightly dirty from a statistics point of view, if a single host has more or less number of reports in the ten-minute bucket his throughput will be weighted slightly more or less than that of other hosts. This might be more correct from a statistics point of view:
index=os source=interfaces eth0 | sort - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/1024*(lastTime-_time))
| bucket span=10m _time | stats avg(thruput_kb) as average_kb_per_host by _time host
| timechart span=10m sum(thruput_kb) as cluster_thruput_kb
My brain isn't quire sure on what's more correct right now, so do try both and think about what works best.
Those differences are expected - every time you run the search the underlying data changes a little because the time range has progressed a little.
Are you running the search over a fixed time range (e.g. "Yesterday") or a relative time range (e.g. "Last 24 hours")?
I am running on Last 24 hours and difference is very minor.
Martin, Thank you for taking look at this query. Your 2nd query which I was looking for with modification as follows
For some reason stats average was getting zero for few of hosts so I changed stats avg(thruput_kb) as average_kb_per_host by _time host
to stats avg(thruput_kb) as average_kb_per_host host _time
, looks like fields order does matter.
I think in timechart span=10m sum(thruput_kb) as cluster_thruput_kb
you meant sum(average_kb_per_host)
.
So final query as following gives me believable output in chart BUT every single time I run this query gives me minor variation in timechart for 24 hour worth of data.
Is that expected ?
index=os source=interfaces eth0 | sort 0 - _time
| streamstats current=f window=1 global=f last(RXbytes) as lastRX last(_time) as lastTime
| eval thruput_kb = case(lastRX > RXbytes, (lastRX-RXbytes)/(1024*(lastTime-_time)))
| bucket span=1h _time |stats avg(thruput_kb) as average_kb_per_host by host _time
| timechart span=1h sum(average_kb_per_host) as cluster_thruput_kb