I can't get my mind around this issue which is as follows:
We have a syslog-ng which dumps data from 100+ network devices to a log directory which then Splunk is set up to monitor and it works perfectly. The problem is that if i disable the data input in Splunk gui it seems to continue indexing for up to an hour after, but it only does so with a few devices, not all of them.
Could this be that the data format on certain hosts are unknown for Splunk and that it seemingly continues to log, and that the time stamp is being applied at index time such that it seems like it is still indexing while in reality it just running through the queue of events from before i disable the data input?
One possible reason for that is that you are retrieving so much data into your directory that Splunk tcp input queue is overwhelmed really, and is actually delayed indexing your files content.
This scenario makes that it still keeps ingesting data for a while after you disable the input, because it has to keep up with what he knows is in the files.
Do a test of disabling the input at a specific minute X, and wait a few minutes and recheck in Splunk if you have any event later than that X point in time.
Please let me know if the answer was useful for you. If it was, accept it and upvote. If not, give us more input so we can help you with that