Host with Splunk Universal Forwarder not forwardi...

jalbarracinklar · ‎01-09-2024

Hi!

We have been installing Splunk Universal Forwarder on different servers in the on-prem environment of the company where I work, to bring the logs to an index in our Splunk Cloud.
We managed to do it on almost all servers running Ubuntu, CentOS and Windows.
Occasionally, we are having problems on a server with Ubuntu.
For the installation, we did the following as we did for every other Ubuntu server:

dpkg -i splunkforwarder-9.1.2-b6b9c8185839-linux-2.6-amd64.deb
cd /opt/splunkforwarder/bin
./splunk start
Insert user and password
Download splunkclouduf.spl
/opt/splunkforwarder/bin/splunk install app splunkclouduf.spl
./splunk add forward-server http-inputs-klar.splunkcloud.com:443
cd /opt/splunkforwarder/etc/system/local
define input.conf as:
1. # Monitor system logs for authentication and authorization events
  [monitor:///var/log/auth.log]
  disabled = false
  index = spei_servers
  sourcetype = linux_secure
  
  #fix bug in ubuntu related to: "Events from tracker.log have not been seen for the last 90 seconds, which is more than the yellow threshold (45 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked."
  [health_reporter]
  aggregate_ingestion_latency_health = 0
  
  [feature:ingestion_latency]
  alert.disabled = 1
  
  disabled = 1
  
  # Monitor system logs for general security events
  [monitor:///var/log/syslog]
  disabled = false
  index = spei_servers
  sourcetype = linux_syslog
  
  # Monitor Apache access and error logs
  [monitor:///var/log/apache2/access.log]
  disabled = false
  index = spei_servers
  sourcetype = apache_access
  
  [monitor:///var/log/apache2/error.log]
  disabled = false
  index = spei_servers
  sourcetype = apache_error
  
  # Monitor SSH logs for login attempts
  [monitor:///var/log/auth.log]
  disabled = false
  index = spei_servers
  sourcetype = sshd
  
  # Monitor sudo commands executed by users
  [monitor:///var/log/auth.log]
  disabled = false
  index = spei_servers
  sourcetype = sudo
  
  # Monitor UFW firewall logs (assuming UFW is used)
  [monitor:///var/log/ufw.log]
  disabled = false
  index = spei_servers
  sourcetype = ufw
  
  # Monitor audit logs (if available)
  [monitor:///var/log/audit/audit.log]
  disabled = false
  index = spei_servers
  sourcetype = linux_audit
  
  # Monitor file integrity using auditd (if available)
  [monitor:///var/log/audit/auditd.log]
  disabled = false
  index = spei_servers
  sourcetype = auditd
  
  # Monitor for changes to critical system files
  [monitor:///etc/passwd]
  disabled = false
  index = spei_servers
  sourcetype = linux_config
  
  # Monitor for changes to critical system binaries
  [monitor:///bin]
  disabled = false
  index = spei_servers
  sourcetype = linux_config
  
  # Monitor for changes to critical system configuration files
  [monitor:///etc]
  disabled = false
  index = spei_servers
  sourcetype = linux_config
echo "[httpout]
httpEventCollectorToken = <our index token>
uri = https:// <our subdomain>.splunkcloud.com:443" > outputs.conf
cd /opt/splunkforwarder/bin
export SPLUNK_HOME=/opt/splunkforwarder
./splunk restart

When going to Splunk Cloud, we don't see the logs coming from this specific server.

So we saw our logs and we saw this in health.log:

root@coas:/opt/splunkforwarder/var/log/splunk# tail health.log
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Forwarder Ingestion Latency" color=green due_to_stanza="feature:ingestion_latency_reported" node_type=feature node_path=splunkd.file_monitor_input.forwarder_ingestion_latency
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Ingestion Latency" color=red due_to_stanza="feature:ingestion_latency" due_to_indicator="ingestion_latency_gap_multiplier" node_type=feature node_path=splunkd.file_monitor_input.ingestion_latency
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Ingestion Latency" color=red indicator="ingestion_latency_gap_multiplier" due_to_threshold_value=1 measured_value=1755 reason="Events from tracker.log have not been seen for the last 1755 seconds, which is more than the red threshold (210 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked." node_type=indicator node_path=splunkd.file_monitor_input.ingestion_latency.ingestion_latency_gap_multiplier
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Large and Archive File Reader-0" color=green due_to_stanza="feature:batchreader" node_type=feature node_path=splunkd.file_monitor_input.large_and_archive_file_reader-0
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Real-time Reader-0" color=red due_to_stanza="feature:tailreader" due_to_indicator="data_out_rate" node_type=feature node_path=splunkd.file_monitor_input.real-time_reader-0
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Real-time Reader-0" color=red indicator="data_out_rate" due_to_threshold_value=2 measured_value=352 reason="The monitor input cannot produce data because splunkd's processing queues are full. This will be caused by inadequate indexing or forwarding rate, or a sudden burst of incoming data." node_type=indicator node_path=splunkd.file_monitor_input.real-time_reader-0.data_out_rate
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Workload Management" color=green node_type=category node_path=splunkd.workload_management
01-09-2024 08:21:30.197 -0600 INFO PeriodicHealthReporter - feature="Admission Rules Check" color=green due_to_stanza="feature:admission_rules_check" node_type=feature node_path=splunkd.workload_management.admission_rules_check
01-09-2024 08:21:30.198 -0600 INFO PeriodicHealthReporter - feature="Configuration Check" color=green due_to_stanza="feature:wlm_configuration_check" node_type=feature node_path=splunkd.workload_management.configuration_check
01-09-2024 08:21:30.198 -0600 INFO PeriodicHealthReporter - feature="System Check" color=green due_to_stanza="feature:wlm_system_check" node_type=feature node_path=splunkd.workload_management.system_check

and this in splunkd.log:

root@coas:/opt/splunkforwarder/var/log/splunk# tail splunkd.log
01-09-2024 08:33:01.227 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=54.87.146.250:9997 timed out
01-09-2024 08:33:21.135 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=54.160.213.9:9997 timed out
01-09-2024 08:33:41.034 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=54.160.213.9:9997 timed out
01-09-2024 08:34:00.942 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=54.87.146.250:9997 timed out
01-09-2024 08:34:20.841 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=18.214.192.43:9997 timed out
01-09-2024 08:34:40.750 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=18.214.192.43:9997 timed out
01-09-2024 08:35:00.637 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=54.87.146.250:9997 timed out
01-09-2024 08:35:20.544 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=54.160.213.9:9997 timed out
01-09-2024 08:35:40.443 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=18.214.192.43:9997 timed out
01-09-2024 08:36:00.352 -0600 WARN AutoLoadBalancedConnectionStrategy [3273664 TcpOutEloop] - Cooked connection to ip=54.87.146.250:9997 timed out

do you have any thought or have faced this issue in the past?

gcusello · ‎01-09-2024

Hi @jalbarracinklar ,

if you install the app downloaded from Splunk Coud, you don't need the following add forward-server command, because the app alreacy has all the information to connect to Splunk Cloud.

One hint: when I have this kind od requisites, I prefer to have two Hevy Forwarders in my on premise infrastructure that are the concentrators of the logs from all the on-premise systems, in this way I have to open the connection to Splunk Cloud only for these two systems.

Then, if you have many Universal Forwarders, use a Deployment Server to deploy apps to them, dont manage them manually.

About Ubuntu, I read reporting of many issues in Community, be sure about the grants to run Splunk and to access files to read.

In addition, check if you are receivig logs from Ubuntu servers: if yes, the issue is in the monitor stanzas, if not in the connection.

Ciao.

Giuseppe

jalbarracinklar · ‎01-10-2024

Ciao Giuseppe!

Thank you a lot for your answer! 🙂

We finally saw it was something related to a configuration on our firewall because we couldn't even see our IP going to Splunk through the firewall and the services were up and running on the server with Splunk Universal Forwarder installed.

Regarding the Deployment server, we have ~20 servers with Splunk Universal Forwarder installed on them. Should we have a deployment server in the same environment to be able to manage all of those Splunk UFs? Do you have any recommendation on this?

Thanks again!

Juanma

isoutamo · ‎01-10-2024

Hi

Based on your error message it's related to network connection. Just check both host and network based FWs to see that everything is ok. If I understand you already fixed this on your FW side?

Should you use HF as a HUB/consentrator is totally dependent on your security policy. If you have strictly security zone based architecture (don't allowed direct connection to outside) then you definitely need an intermediate forwarders. But if not then those just create more complexity on your environment and don't give to best perfomance for you.

If you have lot of UFs and haven't any other configuration management software/service/system then you should use DS and if you have already something in place then you should use it instead of bring totally new way to do it.

r. Ismo

gcusello · ‎01-10-2024

Hi @jalbarracinklar ,

About the use of two HFs as concentrators I always use them in architectures like your.

Remember to use two HFs if you need HA, otherwise one is sufficient.

I always prefer to use a Deployment Server to manage Forwarders configurations.

For 20 clients you don't need a dedicated server and you could use one of the two Heavy Forwarders used as Concentrators,

Even if a dedicated server is always better if you haven't problems in server availability.

Ciao.

Giuseppe

Host with Splunk Universal Forwarder not forwarding to Splunk Cloud

indexer

inputs.conf

universal forwarder

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

.conf24 | Personalize your .conf experience with Learning Paths!