How to get Capacity planning and Availability report in Splunk for servers?
I wish to get reports on Servers Availability and Capacity planning. Is there any readymade search available for those reports?
Capacity planning is very broad. From nix and windows TA, you could monitor and trend the CPU, memory, disk utilisation a period of time [ say daily, monthly, yearly]. You could then, based on your goals, decide to procure additional hardware or disk when constantly you are reaching your threshold [ e.g. disk usage is more than 75%].
Also, you can group the performance by applications, eg. web servers usage, database usage and decide to procure servers only for them and increase your scale/availability.
Can I have a sample search to achieve this using avg values of cpu,memory and disk?
I actually just built a Capacity Planning solution for my organization. It's using machine learning to forecast when a server or cluster will run out of disk and doubles up as a "what if machine" so the user can go through scenarios to see what happens if they remove 10TB from this cluster, when will it run out of disk. You can also enter any future date and it will give you disk usage at that date.
You should first define what your future state will be and what you want to accomplish.
I'm working on a similar problem @skoelpin . Could you please elaborate your approach of solving this.
Problem i'm facing is like, single host has multiple mounts(C:, D:, etc). my approach is good when servers list is less but when # of servers increase, it's difficult.
Thanks is advance.
Sure, I had the same problem. We had to figure out a clever way to scale this. We achieved this through a few methods.. We first started with a single drive and 5 clusters with a total of 15 servers. I created 2 total lookup files, the first one with host values to drive the first dropdown, when the user selects the app, it dynamically populates the second drop down so the user can select a single host or an aggregate of the cluster. The second lookup table holds a row for the host, slope, y intercept, and drive letter. Anytime disk is purged or added, the y intercept value will change but the slope will remain constant.
When we started to scale, we had to reduce our dependency on the lookups because it was getting difficult to maintain these values across hundreds of servers. We found a way to dynamically populate the slope value and created an additional dropdown for drives so we could do multiple drives per host.
Another approach we took to match the model name to the host value selected was to use a good naming convention for the model names. So if the user were to select a hostname in the dropdown, that hostname will be passed to the model name and will look like this | apply Forecasting_$HOST$
.
One last word of advice.. Create short feedback loops to judge accuracy. You gotta be confident in the results your getting from the forecast so creating a few panels dedicated to accuracy is important
Can I have a sample search to achieve this using avg values of cpu,memory and disk?
Sure. The SPL below just does disk, but you can easily add cpu and mem with additional counters.
index=xxx host=xxx sourcetype="Perfmon:FreeDiskSpace" counter="% Free Space" OR counter="Free Megabytes" instance=G:
| eval FreeGB=FreeMBytes/1024
| eval Free_percent=100-storage_used_percent
| timechart span=1d min(FreeGB) AS FreeGB min(Free_percent) AS Free_percent
| eval Used_percent=100-Free_percent
| eval Total_Cap=100*(FreeGB/Free_percent)
Next I created a timeshift so I could create empty buckets for future values then fed it into the MLTK to fill the empty buckets with the (slope + the previous value) to get future forecasted values. The "what if" part comes from adjusting the y intercept value.
| makeresults count=100000
| streamstats count as count
| eval earliest_time=now()
| eval time=case(count=100000,relative_time(earliest_time,"+100000d"),count=1,earliest_time)
| makecontinuous time span=1d
| eval timeAsANumber=time
| eval _time=time
| eval time_human=strftime(time, "%Y-%m-%d %H:%M:%S")
| fields + time
| append
[| search
index=xxx host=xxx sourcetype="Perfmon:FreeDiskSpace" counter="% Free Space" OR counter="Free Megabytes" instance=G:
| eval FreeGB=FreeMBytes/1024
| eval Free_percent=100-storage_used_percent
| timechart span=1d min(FreeGB) AS FreeGB min(Free_percent) AS Free_percent
| eval Used_percent=100-Free_percent
| eval Total_Cap=100*(FreeGB/Free_percent)]
Nope. You would need to build one based on your needs.
Do you have a suggestion over Capacity planning? I am using both unix and windows addon to get memory,CPU and disk utilization.