- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found out what the problem was. There is a Cribl server between UF and Indexer, which I mistakenly ruled out as the source of the problem during throubleshooting. I bypassed Cribl for a while and the problem disappeared.
The rest was already pretty fast. I found that there was a persistent queue enabled for Linux input/source in the "Alway On" mode. The persistent queue was not turned on for Windows Input/source. Windows logs were OK all the time. After turning it off for Linux data, the problem disappeared.
I don't understand why the persistent queue behaves this way, but I don't have time to investigate further. Maybe it's a Cribl bug or a misunderstanding of functionality. The input queue is not required in the project, so I can leave it off.
For me, it's currently resolved
Thank you all for your help and your time
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Yeah, you're right. It was the other-way sawtooth. It looks strange. Are you sure you don't have any network-level issues? And don't you see any other interesting stuff in _internal (outside of the Metrics component) for this forwarder?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have two weeks off, so I'll continue troubleshooting after that.
In my opinion there are not any interesting stuff in _internal log. You can see it on the screenshot. I used cluster command to reduce log number. There is component != metric in SPL.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Right. That was !=, not =.
You're mostly interested in
index=_internal component=AutoLoadBalancedConnectionStrategy host=<your_forwarder>
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I looked at the events for the component you mentioned and found that there is only one type of log entry.
I also tried it for the "last 7 days" time range.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Which kind of logs you are collecting? Is it possible that there is some log or input which stalled this after it has read and then UF just wait free resources to read next one?
Have you only one or several pipelines in your UF?
Any performance data from OS level and which OS, version you have?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am collecting logs from some files from /var/log and sysmon from journald.
last 90 minutes
/opt/splunkforwarder/var/log/splunk/audit.log | 41 |
/opt/splunkforwarder/var/log/splunk/health.log | 39 |
/opt/splunkforwarder/var/log/splunk/metrics.log | 8911 |
/opt/splunkforwarder/var/log/splunk/splunkd.log | 598 |
/var/log/audit/audit.log | 7 |
/var/log/messages | 936 |
/var/log/secure | 10 |
journald://sysmon | 919 |
inputs.conf
[monitor:///var/log/syslog]
I will find out the OS version later. I do not have direct access to the OS. I thing it is CentOS/Redhat 8 or 9, but I may be wrong.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Based on amount of entries from audit.log it is quite low. Can you check is there really so few entries on source?
If those are entries from one Linux node from 90 minutes period it’s really unused.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got a direct access to the sever again and I checked OS version. It is Red Hat Enterprise Linux release 9.4 (Plow).
I will try to add pipeline and I will check if it helps. I am going to check if there is not something connected with sysmon.
It was right. There were only few log entries in audit.log during the period. I checked it on filesystem. After my ssh connection there is more log entrie.
Last 90 minuts
/opt/splunkforwarder/var/log/splunk/audit.log | 2 |
/opt/splunkforwarder/var/log/splunk/conf.log | 1 |
/opt/splunkforwarder/var/log/splunk/configuration_change.log | 3 |
/opt/splunkforwarder/var/log/splunk/health.log | 26 |
/opt/splunkforwarder/var/log/splunk/metrics.log | 8975 |
/opt/splunkforwarder/var/log/splunk/splunkd-utility.log | 10 |
/opt/splunkforwarder/var/log/splunk/splunkd.log | 1055 |
/opt/splunkforwarder/var/log/watchdog/watchdog.log | 3 |
/var/log/audit/audit.log | 1337 |
/var/log/messages | 9418 |
/var/log/secure | 543 |
journald://sysmon | 6482 |
I revealed an interesting correlation. You can see a "gap" or change in behavior in the graph. It starts after the UF is restarted. There are messages "Found currently active indexer. Connected to idx=X.X.X.X:9992:0, reuse=1." before UF restart. After 20 minutes from restart they are back.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried setting parallelIngestionPipelines = 2 in server.conf and the behavior did not change.
I also tried stopping sysmon deamon and disabling sysmon journald input. It had no effect on the above behavior.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Based on number of your log events it had been surprise if that was helped.
Have you look network interface stats, if there is something weird?
Was it so, that this same issue was in all your Linux uf nodes? If yes then it heavily pointed to some configuration issue!
Can you show your outputs.conf settings exported by btool with —debug option?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did not find anything weird about the interface stats.
Similar problem occurs in all Linux nodes, but differs in period/delay.
There is btool output configuration
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Do you have UFs in other OS like windows or some Unix and if, have those the same issue?
Can you post your indexer’s relevant inputs.conf output from btool too?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found out what the problem was. There is a Cribl server between UF and Indexer, which I mistakenly ruled out as the source of the problem during throubleshooting. I bypassed Cribl for a while and the problem disappeared.
The rest was already pretty fast. I found that there was a persistent queue enabled for Linux input/source in the "Alway On" mode. The persistent queue was not turned on for Windows Input/source. Windows logs were OK all the time. After turning it off for Linux data, the problem disappeared.
I don't understand why the persistent queue behaves this way, but I don't have time to investigate further. Maybe it's a Cribl bug or a misunderstanding of functionality. The input queue is not required in the project, so I can leave it off.
For me, it's currently resolved
Thank you all for your help and your time
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Any errors on either side of the connection?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
UF host for last 60 minutes with now errors and warnings
IDX side
Still a problem here. This morning we had to reboot from the Splunk servers due to a security patch of the operating system. You can see it at the beginning of the graph. This meant that the connection between UF and IDX had to be re-established, i.e. when IDX or UF restarts, about 20 minutes yesterday and today 10 minutes is not the delay or batch processing.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

These errors are completely unrelated. You'd need to dig deeper to find something relevant regarding inputs on the receiving side or outputs on the sending site.
And the shape of your graph does look awfully close to a situation with periodic batch input which then unloads with a limited thruput connection.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know that these errors are unrelated. I tried to show that internal log are not full of "error" messages.
Situation is
- Thruput is not limited (thruput se to 10240)
- Number of logs is low
- logs in files are generated fluently, i checked by "tail -f"
- during aprox 20 minutes after UF restart there is no problem
- after this time problem appears
- the problem is
- Data are buffered somewhere in front of indexer server, it is buffered aprox 9 minutes. After I restarted UF or droped TCP session, data were suddenly sent to the indexer.
- I belive It must be buffered on UF side. I saw no dat period and then data burst on indexer site.
- Shape of the grahp is saying the same thing. Data are somehere for some period of time and then are flushed to indexer. Older data are bigger diff a newer data are lower diff.
Index time
SendQ
TCPout
Queues
internal messages (clustered)
