So I have managed to get the data into splunk and using the Quick Chart I can see individual pieces of data in a graphical form to verify this. However, if I start to search, it's not very clear how to get any useful reports out of the system.
For example, how do I search for all devices where memory is over 95% used? Is there a way to check for spikes in data?
You would use the quick chart to start to build the info you are interested, then open it in search and append a |where
pipe to start to filter down based on whatever it is you are interested in.
The quick chart is there to help you get the data close to what you want to start with and you finish it off as you see fit.
I'd be glad to help further id you want to hit me up on slack (splk.it/slack to sign up) I'm @mattymo
Here's the general process, when you don't know what you are really looking for.
1) Find ONE real-life instance where the event you are looking for happened. In this case, let's say you know that host myhost1 was CPU-pegged at about 2:00 yesterday.
2) Do a search in verbose mode for everything about that host that is on your splunk database at about that time. Set the time range and use this query....
index=* myhost1
or possibly this query
index=* myhost1 ("CPU" OR "processor")
3) Find a single event of the type you want. Look at the form of the event, the fields present, and so on. Especially identify the index and sourcetype, because those will generally be constants.
4) In this case, if you were already ingesting the events in the "standard" way, then the final query for CPU usage might look something like this...
index=blah counter="% Processor Time"
Sourcetype="perfmon:processor"
earliest=-5m latest=now
| stats min(Value) as minValue by host
| where minValue>95
This code stolen as-is from this question...
https://answers.splunk.com/answers/546361/if-cpu-is-95-for-more-than-5-minutes-how-do-you-wr.html
5) For disk space, you could get some clues off of this answer...
https://answers.splunk.com/answers/454999/how-to-develop-a-search-to-find-free-disk-space-us.html
6) You can also join the splunk slack channel and ask questions there.
Thanks for the reply, I have kinda done that for the ping stats and worked a graph up using:
index=cacti sourcetype="cacti:mirage" rrdn=ping | sort max(rrdv) | timechart max(rrdv) by hostname limit=0
That nicely gives me a gave of the maximum ping time during a given period for all hosts.
However, there are two types of information for a device that I was looking at, Ping (singular value) and Network stats which has two (traffic_in / traffic_out). Whilst I can see events for both when I do the search by hostname, I can't seem to see how to get the values for the traffic_in/traffic_out that the Mirage addon should have imported.
sorry for the delay, i missed this post. Are you receiving events with rrdn=traffic* ?