Hi Community,
I have two separate Splunk installs: one is the 8.1.0 version and another one is 8.2.5
The older version is our production Splunk install. I can see a lag in the dashboard set-up which calculates the difference between the index time and the actual time.
Since its production environment, I assumed that the lag might be due to the below reasons.
In order to clarify the issue, I set up the same in another environment. This is a test environment which does not have a heavy load as in production but has the same settings with reduced memory. When I set up a completely new forwarder, and replicate the setup in the test environment, I still see the same lag.
This is very confusing as to why it's happening?
Could someone provide me with tips or guidance on how to work through this issue?
Thanks in advance.
Regards,
Pravin
There are many possible reasons for event lag. From the top of my head:
1) Interminnent network connectivity problems.
2) Not enough bandwidth (either restricted in limits.conf with thruput settings or you simply have low capacity network link)
3) There is no lag as such but your source's clock can be skewed
The lag caused by many directories to monitor is typically present only shortly after the start of the forwarder because it has to check all the dirs and files to verify whether their state is consistent with fishbucket database. After that it only checks new files. But you might need to raise your opened files limit to help your forwarder keep track of all those files.
Upgrading the forwarder is of course highly advised since 6.x hasn't been supported for some time already. It should work, but Splunk doesn't support it anymore.
Hi @PickleRick ,
I have checked for all the below conditions. We have a proper network setup with no internet issues and limits.conf has default settings. I doubted that the issue might be because of the skewed clock since we work in a different time zone than my original location. But all the servers had a common time zone so this is also not the case here.
Since the forwarder reads data from different log files across folders, the lag that we find is the maximum lag for a particular sourcetype. For example:
In the above image, the event was generated at 9:00 in the morning but it was indexed only at 13:00 which is almost 4 hours after the event was generated.
For the same source, when the run the SPL after a few minutes for the same source as earlier, I notice a delay of only a few minutes.
I don't understand if this is because of the load on the forwarder or some other issue.
For the same source, two different events have different indexing times. This is the one that really confuses me. Could you please throw some light on this?
Regards,
Pravin
OK. I don't know about the lag but there's something fishy about your timestamp parsing.
Your raw events have two timestamps each. And in the first screenshot there is almost four hours difference between them 9:14 vs 13:05. I don't know what timestamps are those but they might be some kind of a "start timestamp" and "end timestamp". And for some reason it seems that index time is relatively close to the second one (which makes sense - something ends, the event is emitted, it's received by the forwarder, passed on to the indexer, indexed - there can be some slight delay) but the _time is being parsed from the first timestamp.
So it doesn't seem to be an issue of "lag" but more of a time parsing problem.
Hi
I totally agree with @PickleRick that this is more probably issue with your event's time stamps than lag in connections. Of course if there are lot of events coming from one UF then the throughput limit can hit, but based on your event time stamps (only couple with same second), I not expecting that.
One another issue with those timestamps are that those are not containing TZ information! If you have operations on different time zones then that could be also one reason for 1 or 0.5 times x hours differences between _time and _indextime.
If you have set up MC part on your node you can try to look with it those issues. It also told if there are some other issues in your input phase (Settings -> MC -> indexing -> inputs).
r. Ismo
Thank you @PickleRick.
I will check the different reasons listed by you and update this thread.
Hi @ITWhisperer ,
One is the index time and another one is the event time.
| eval lag_sec=_indextime-_time
The difference is calculated as mentioned above.
Regards,
Pravin
Where do the two times (index and actual) come from?
Both the times are available from the data.