I am relatively new to a company that has used Splunk Professional Services to spin up a Splunk Cloud environment before I was hired. The company IT has onboarded a lot of AWS, Azure, on-prem and network devices so far. I’m trying to verify that they are in fact sending logs into the Splunk index so that I can eventually apply use cases and alerting on the logs as well as troubleshoot those hosts which aren’t sending but are supposed to be. There isn’t a Splunk resource in the company so I am trying my best to figure it as I go. (classic)
The IT manager gave me a spreadsheet of hostnames and private IP addresses for all the devices which are forwarding logs. At first I thought I could run a search to just compare his list with logs received by hostname but I can’t figure that out. Here’s what I did instead.
Over a 30-day search I run | metadata type=hosts index=* and I exported the results to a csv. I took the ‘hosts’ column (which was a combination of hostnames and IP addresses) from the export and did a diff against the IT managers list of hostnames/IP addresses and where it wasn’t found, presumed it had not sent logs during that time period. The inventory has about ~850 line items in total which are supposedly onboarded and I saw logs from about ~250. Obviously I am second guessing myself because of the delta.
When I spot check some hostnames/IP addresses from the asset inventory spreadsheet from IT in Splunk, there are some that return no results, some that is just DNS or FW traffic from that server (so needs onboarding to get server logs) but others where I get results where the ‘host’ field is a cloud appliance (like Meraki) and the hostname or IP matches to other fields such as ‘dvc_up’, ‘deviceName’ or ‘dvc’ fields. This is really confusing the heck out of me and making me question if there is a better way. So, is there? How do you normally audit and verify that your logs are still being received into your Splunk instance?
Thanks so much for your help and looking forward to learning!
There are a number of ways or monitoring this. I would not recommend using |metadata, as it is not the right way to monitor data coming / not coming in.
Best way to check for data/hosts is the tstats command
| tstats count where index=* OR index=_* by host
which will give you the data coming in over the search period. You can also split by time if wanted with
| tstats count where index=* OR index=_* by host _time span=1d
then if you have a lookup file (csv) of the expected hosts then your search would look something like this
| tstats count where index=* OR index=_* by host
| append [
| inputlookup your_master_list.csv
| eval count=0
]
| stats max(count) as count by host
| where count=0
so you first collect your received data by host, then you append the list of hosts from the lookup and set a dummy 'count=0' for that host.
Finally the stats at the end will collate the max value of count seen for each host. If there is data for the host, count will be > 0, otherwise it will be 0.
So the final where, will then show you hosts that are missing data.
Note that there are some useful tools, one I particularly like is TrackMe, which is a really powerful tool to alert when hosts or sourcetypes stop appearing in Splunk.
https://splunkbase.splunk.com/app/4621/
It's Cloud certified and is free.