Splunk Search

How to search for possible missing sections when logs did not come in from a given host?

Pierceyuk
Path Finder

So we spot checked a random time in splunk for a sourcetype(made up of 2 hosts sending in data).
The data was missing, running the report for just that date shows there was a window of approx 45m where no data came in.

This raises the obvious question of are there any other gaps I need to investigate and back load?

It generates approx 5 million events per day and I need to check the last 2 months worth of logs, is there an easy way of doing this without having to run each day individually?

sourcetype=mysourcetype | timechart span=30m count by host 
0 Karma
1 Solution

lukejadamec
Super Champion

If you make the assumption that if the forwarder is online, then all sourcetypes will be collected. This would mean that if there is no data for a sourcetype, then no data for that source type was generated. Given that, you can search for index volume by host in 30 minute buckets - if there is no volume, then the forwarder was probably off-line. You can run this type of search against the internal index, which is much faster than searching for and counting events. The problem is that internal index may only have 30 days of history. Here is the search:

index=_internal source=*metrics.log group=per_host_thruput | eval totalGB = (kb/1024)/1024 | timechart span=30m sum(totalGB) by series useother=f limit=100

Not sure how many servers you have, so I set the limit to 100. If you have more, then increase the limit.

View solution in original post

lukejadamec
Super Champion

If you make the assumption that if the forwarder is online, then all sourcetypes will be collected. This would mean that if there is no data for a sourcetype, then no data for that source type was generated. Given that, you can search for index volume by host in 30 minute buckets - if there is no volume, then the forwarder was probably off-line. You can run this type of search against the internal index, which is much faster than searching for and counting events. The problem is that internal index may only have 30 days of history. Here is the search:

index=_internal source=*metrics.log group=per_host_thruput | eval totalGB = (kb/1024)/1024 | timechart span=30m sum(totalGB) by series useother=f limit=100

Not sure how many servers you have, so I set the limit to 100. If you have more, then increase the limit.

View solution in original post

lukejadamec
Super Champion

In this index for this source and group the host is named the series. You should check out the different groups. I find some of them useful.
index=_internal source=*metrics.log |dedup group |table group

0 Karma

Pierceyuk
Path Finder

So its two hosts but yes, no data from host then there is a problem. That query is very useful, whats the filter to add in to show only certain hosts? host=server1 is not doing it so I suspect its some other terminology?

0 Karma

Pierceyuk
Path Finder

In the end I just ran the query for 2 months and say back and waited the 2+ hours for it to finish.

sourcetype=mysourcetype | timechart span=30m count by host

a quick bit of excel after with some countif(B:B,0) to tell me how many time windows had no results and got an approx number. Probably a snazzy way to do this in splunk but its beyond me.

0 Karma