I've 5000 linux servers and I would like to do a splunk search to get their disk utilization. Its not possible to do a df on 5000 servers, I'm doing a dashboard for servers that crosses 85% utilization and schedule a PDF delivery to my email. What would be the best splunk search command to find the disk utilization %'s for splunk on all the servers and filter to servers greater than 85%.
Lets assume your data looks like this:
host=server1,dev=/dev/sda1,free=1000,used=1000,total=2000
You would run this search:
index=indexname | eval utilization=(used/total)*100 | stats first(utilization) by host | where utilization > 84
Lets assume your data looks like this:
host=server1,dev=/dev/sda1,free=1000,used=1000,total=2000
You would run this search:
index=indexname | eval utilization=(used/total)*100 | stats first(utilization) by host | where utilization > 84
I wouldnt call this the "best" search but it's certainly a search that would work.
By installing the Splunk app for *nix, and deploying the *nix TA to your Linux servers, you can have your servers report the output of the "df" command in Splunk. You can then do a search on your collected "df" data to find servers that have 85% or higher disk utilisation.
For this you would need:
A Splunk server (search head / indexer)
A Splunk Deployment server to push out the *nix TA (Doesn't need to be a separate server)
A Splunk universal forwarder installed on your 5000 Linux servers that is connected to your deployment server.
Do you have any of this in place yet? Secondly, are you sure Splunk is the best answer for your problem(s)? You don't have any performance metrics collection in place already?
Just to clarify, splunk is not some sort of "job distribution engine". You don't enter a command in splunk which is then sent to connected servers.
Instead, you define inputs on the servers, and they send their output to splunk. You can then search on and interpret that data (or create alerts and email delivery) as much as you like.
I was referring to search query to get results and create dashboards.
And I wanted to point out that you will have to make all 5000 servers run a df once in a while and capture the output of that with splunk in order to be able to put that info on a dashboard.
If you don't want to do all that by hand, you might be interested in this app/TA.
What version of Splunk are you running?
So what is the data that you have collected in Splunk from the 5000 servers? Because you can't write a dashboard if you don't have the data...
Version 6.3. The universal forwarders sends the syslog data and other oracledb data.