Universal forwarder with 2 hours delay on new log ...

mihelic · ‎04-12-2013

Log messages received by our central loghost take up to 2 hours before being visible on the indexer.

Our network hardware and servers send their messages via syslog to a central loghost running syslog-ng which filters the messages into their respective files. That totals to about 1600 log files with current data. The log files rotate daily at midnight (their name is in the form of service-20130410.log). Then there is a universal forwarder, that monitors the folders/file for changes and forwards that data to a Splunk indexer. Altogether we index about 12GB of data per day.

A typical monitor stanza looks like this:

[monitor:///logs/splunk/servicetype]
host_segment=4
index = main
ignoreOlderThan = 3d
sourcetype = servicetype

The indexer has a certain search that runs every hour and requires certain data from the past hour. The results of the search at 1AM and 2AM come up empty every day. Ther were also instances of the 3AM search comming up empty. After that point all subsequent searches return data as they should. There is no more delay. I have checked the indexer and it should not be the bottleneck.

Disk I/O, CPU (24 core), and RAM (32GB) should not be a problem on the loghost server. Although the UF is constantly hogging 1 core to the maximum.

There is a delay between the time files are created and when the universal forwarder notices and forwards them.
How can I tune this to speed this up?

Is the 1600 monitored files considered a high or a low number for the universal forwarder?

Kind regards, Mitja

esix_splunk · ‎03-31-2015

How about the timezone settings on the forwarder vs Indexer vs SH?

rharrisssi · ‎03-31-2015

Did you ever figure this out?

kristian_kolb · ‎04-12-2013

Perhaps you are pushing the envelope a little bit during peak hours. By default, the UF is limited to 256kbps (configurable). 12GB/day averages to 138kbps.

http://splunk-base.splunk.com/answers/53138/maximum-traffic-of-a-universal-forwarder

Also, you could have a case of blocked queues on the indexer side;

http://splunk-base.splunk.com/answers/31151/index-performance-issue-high-latency
http://wiki.splunk.com/Community:TroubleshootingBlockedQueues

To find out how the UF is performing when reading the files, you could also check out the REST api on the UF itself;

https://your-splunk-forwarder:8089/services/admin/inputstatus/TailingProcessor:FileStatus

Also, you should install the S.O.S app, which is great for diagnosing problems...

/k

Runals · ‎04-01-2015

To expand on what Kristian posted try running this search

index=_internal sourcetype=splunkd "current data throughput" | rex "Current data throughput \((?<kb>\S+)" | eval rate=case(kb < 500, "256", kb > 499 AND kb < 520, "512", kb > 520 AND kb < 770 ,"768", kb>771 AND kb<1210, "1024", 1=1, ">1024") | stats count as Count sparkline as Trend by host, rate | where Count > 4 | rename rate as "Throughput rate(kb)" | sort -"Throughput rate(kb)",-Count

It is one I baked into the app Forwarder Health

Universal forwarder with 2 hours delay on new log files

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation