Are you looking for files that were once successfully ingested and are no longer being read or files that were never ingested at all?
The former case is a matter of searching some long period (like 30 days) to build a list of expected files then searching a short period (like today) to build a list of current files and comparing the two.
The latter case is more challenging. A source that was never read will not be in Splunk, but you may find an error message in _internal for files that could not be read, perhaps because of permissions. It's possible, of course, for a file to be silently skipped if it's not part of the monitor pattern, for instance.
Please clarify your requirements and we'll try to help.
Thanks for replying on this @richgalloway,aplogize for the delayed reply. yes we are trying to find the files which never reached splunk.
1 by permissions issue
2 OR by files not matching the whitelist pattern or Unknown reasons.
We have have got the count of files per index which are being read/indexed by Splunk UF
| tstats dc(source) WHERE host=10apd- OR host=ew1a-* OR host=dub01pd-* OR host=uw2- OR host=ue1-* index=prod-online* by index
Failed attempt Below:
Now we want to list the number of files which are errored out / Not read by the Splunk UF. So for this the same hosts are being used to filter but how do we get that by Index name and have a bar chart comparing the above query?
index=internal sourcetype=splunkd splunkserver=usw* host=10apd- OR host=ew1a-* OR host=dub01pd-* OR host=uw2- OR host=ue1-* log_level=ERROR
| rex field=message "((?.*))"| stats dc(message) by host | sort – message
Sources that fail are not indexed so you can't get stats by index. I suggest generating stats by source or host.
If you have a list of files and how many events are in them, then you can do something like this and cross-reference:
| tstats count AS EventsInThisFile WHERE index=YourIndexNameHere BY source