I have a bunch of SAN usage data that I am inputting into Splunk that looks as follows, with each line representing an entry in Splunk:
Group: diskdg1 Disks: 21 Disk in use: data04 Capacity: 1%
Group: diskdg2 Disks: 21 Disk in use: data05 Capacity: 1%
Group: diskdg3 Disks: 5 Disk in use: data01 Capacity: 33%
Group: diskdg4 Disks: 34 Disk in use: data08 Capacity: 1%
Group: diskdg5 Disks: 30 Disk in use: data07 Capacity: 1%
Group: diskdg6 Disks: 38 Disk in use: data09 Capacity: 25%
What I would like to do is display a table with these fields, plus a new field displaying a "change in capacity" since 7 days ago. In other words, I would like to evaluate the difference between the capacity field now and the capacity field for that entry 7 days ago.
Can anyone assist me with a search?
Thanks so much, Matt
At first glance, the difference should be pretty easy - you can use the delta
search command. But, delta
lacks a by
clause so you could only do one Group
at a time - a bit of a limitation. But, I think you can use streamstats
to roughly create a delta
per-Group.
Assuming that your data above has field extractions for Group
and Capacity
then a search like this should get you close:
sourcetype=my_san_data
| streamstats last(Capacity) as high first(Capacity) as low by Group window=7 global=f
| eval delta=high-low
| table _time,Group,Capacity,delta
You may need to swap around high vs low just to get it to work out mathematically right. There is an assumption here that you are collecting this data once per day. The way this "should" work is streamstats
will do a sliding window of 7 events per Group
and use the first and last values of Capacity
within each of those sliding windows to calculate a delta.
Obviously a sliding window of 7 events is not necessarily strictly 7 days. It depends on you collecting exactly once per day, every day, without missing one. If you are collecting once per hour, then you can adjust window
to be 168 instead.
There are some more complicated ways of dealing with this like maintaining state in lookups, or time-oriented subsearches if you need a higher precision than a sliding window. But, unless your accuracy requirements are very very high, this should be "close enough".
At first glance, the difference should be pretty easy - you can use the delta
search command. But, delta
lacks a by
clause so you could only do one Group
at a time - a bit of a limitation. But, I think you can use streamstats
to roughly create a delta
per-Group.
Assuming that your data above has field extractions for Group
and Capacity
then a search like this should get you close:
sourcetype=my_san_data
| streamstats last(Capacity) as high first(Capacity) as low by Group window=7 global=f
| eval delta=high-low
| table _time,Group,Capacity,delta
You may need to swap around high vs low just to get it to work out mathematically right. There is an assumption here that you are collecting this data once per day. The way this "should" work is streamstats
will do a sliding window of 7 events per Group
and use the first and last values of Capacity
within each of those sliding windows to calculate a delta.
Obviously a sliding window of 7 events is not necessarily strictly 7 days. It depends on you collecting exactly once per day, every day, without missing one. If you are collecting once per hour, then you can adjust window
to be 168 instead.
There are some more complicated ways of dealing with this like maintaining state in lookups, or time-oriented subsearches if you need a higher precision than a sliding window. But, unless your accuracy requirements are very very high, this should be "close enough".
Thanks for the feedback. Great answer to my question, it certainly is "close enough" haha.