Need a report that:
P.S. Not interested in volumes with high percentage of used disk space - only in those that had a spike of say more than 20%.
I am assuming I'd need to:
UsePct
for a volume and then leaving only those with the delta > 20;timechart
or something similar on those volumes.Blanking out on how to do that and would appreciate your help - thanks!
P.P.S. This is as far as I've gotten - and it seems to correctly ID volumes with usage spikes (updated May 5):
sourcetype=WinHostMon source=disk FileSystem!="SNFS"
| stats min(storage_used_percent) as min,
avg(storage_used_percent) as avg,
max(storage_used_percent) as max,
by host, Name FileSystem DriveType
| eval delta = max - avg
| where delta>20
| sort - max delta avg
The above produces the full stats table for all hosts and their volumes that had a spike; adding | fields host Name
to it would produce just the hosts and volume names; the question remains: what is the best way to plot storage_used_percent
on those volumes over the timeframe of the search?
P.P.P.S. Bonus points for streamlining the above search and making it faster; generally a streamlined mechanism for pinpointing anomalies (spikes, unusual deviations or volatility) on any available metrics - such as CPU, memory, disk and network utilization. (I have yet to properly configure Splunk infrastructure apps - perhaps such mechanisms are included in those.)
UPDATE:
sourcetype=WinHostMon source=disk FileSystem!="SNFS"
| eval description=host."_".Name."_".FileSystem."_".DriveType
| bin _time span=1h
| stats min(storage_used_percent) as min,
avg(storage_used_percent) as avg,
max(storage_used_percent) as max by _time description
| eval delta = max - avg
| eval host=mvindex(split(description,"_"),0)
| eval flag = if(delta > 20,1,0)
| eventstats sum(flag) as flag by host
| where flag > 0
| sort 0 _time
| fields _time host max
| xyseries _time host max
I see, thanks to provide the detail.
It would be very easy to understand if other people wrote like this too.
Thank you - appreciate the kind words. Doesn't seem to be working though (probably something simple).
(Can't seem to post an image... Here is the link to the two screenshots. Hopefully this works.)
I see the pics.
This is because of |where delta > 20
My answer is updated.
as is - still doesn't work. See the same link above for two more screenshots. If I replace the last line with:
| timechart max(max) by host
.... then it's working.
good news.
please provide correct query and accept yours.
I don't understand how yours works yet... 🙂
The one I've been battling with is this:
(sourcetype=WinHostMon source=disk FileSystem!="SNFS") OR (sourcetype=df source="df" Type!="cvfs")
[ search ((sourcetype=WinHostMon source=disk FileSystem!="SNFS") OR (sourcetype=df source="df" Type!="cvfs"))
| eval Name = if (isnull (Name), mount, Name)
| eval FileSystem = if (isnull (FileSystem), Type, FileSystem)
| stats min(storage_used_percent) as min,
avg(storage_used_percent) as avg,
max(storage_used_percent) as max,
by host, Name FileSystem DriveType
| eval delta = max - avg
| where delta>20
| table host Name
]
| timechart max(storage_used_percent) by host
... it works but only for Windows hosts ( sourcetype=WinHostMon source=disk
). For Linux hosts - not yet... ( sourcetype=df source="df"
)
P.S. Thank you for all your help with this.
Hi @mitag
timechart
creates times from time picker.
Howeverxyseries
are only changing the vertical and horizontal.
As a reference.
hi @mitag
_time
. nobody makes timechart
stats
can't compare the original values, eventstats
is better.Sorry for the delay! The sourcetype is the standard sourcetype=WinHostMon
. Searching for Type=Disk
or source=disk
would give you disk stats. Events look like this:
Type=Disk
Name="C:"
DriveType="fixed"
TotalSpaceKB=116859900
FreeSpaceKB=62318744
FileSystem="NTFS"
(host = ws2016_016
source = disk
sourcetype = WinHostMon
)
(If you'd like, I can send you a sample of raw events.)
They are sampled every 5-15 minutes. Some additional fields are calculated - e.g. for the above single event these fields are:
storage 114120.99609375
storage_free 60858.1484375
storage_free_percent 53.32774031126161
storage_used 53262.84765625
storage_used_percent 46.67225968873839
My specific case is this: on several of our hosts, the boot disk ("C:") went full (from about 45% to 100% within minutes, then after 15-45 minutes - back to normal). I need to do a report that only shows those hosts and volumes that had a spike, and plot those spikes over time.
We could of course just search for all hosts with volumes close to full (say, over 90%) - but that does not isolate the spikes correctly as some volumes have been close to full for a while.
So I am thinking:
storage_used_percent
for each volume,timechart
command just on those volumes and hosts.With the following search I am getting a list of hosts and volumes that had a spike:
sourcetype=WinHostMon source=disk FileSystem!="SNFS"
| stats min(storage_used_percent) as min
avg(storage_used_percent) as avg
max(storage_used_percent) as max
by host, Name FileSystem DriveType
| eval delta = max - avg
| where delta>20
| sort - max delta avg
| fields Name host
Now, how do I pipe the results into a timechart (or any other plotting mechanism)?
Thanks!
Does this look right? (Feels weird - as if I am doing two very similar transforms one after another - i.e. doesn't feel efficient.)
sourcetype=WinHostMon source="disk"
[ search sourcetype=WinHostMon source="disk"
| stats min(storage_used_percent) as min,
avg(storage_used_percent) as avg,
max(storage_used_percent) as max,
by host, Name FileSystem DriveType
| eval delta = max - avg
| where delta>20
| sort - max delta avg
| table host Name
]
| timechart max(storage_used_percent) by host