Getting Data In

Docker - Universal Forwarder service stops - Test basic https endpoint failed

pmcl77
Loves-to-Learn Lots

Hi,

I am new to Splunk and running both Splunk Enterprise and Universal Forwarder in a Docker container (on the same host for now).

My forwarder keeps shutting down, and I am not quite sure why. Probably a configuration issue.

This is the last error I find in the logs before the shutdown:

 

 

 

TASK [splunk_common : Test basic https endpoint] *******************************
fatal: [localhost]: FAILED! => {
    "attempts": 60,
    "changed": false,
    "elapsed": 10,
    "failed_when_result": true,
    "redirected": false,
    "status": -1,
    "url": "https://127.0.0.1:8089"
}

MSG:

Status code was -1 and not [200, 404]: Request failed: <urlopen error _ssl.c:1074: The handshake operation timed out>
...ignoring

 

 

 

 

after that there is a bit more in the logs, before it finally stops working:

 

 

 

Monday 20 December 2021  09:57:02 +0000 (0:17:30.719)       0:18:04.687 *******

TASK [splunk_common : Set url prefix for future REST calls] ********************
ok: [localhost]
Monday 20 December 2021  09:57:02 +0000 (0:00:00.088)       0:18:04.775 *******
included: /opt/ansible/roles/splunk_common/tasks/clean_user_seed.yml for localhost
Monday 20 December 2021  09:57:02 +0000 (0:00:00.214)       0:18:04.989 *******
[WARNING]: Using world-readable permissions for temporary files Ansible needs
to create when becoming an unprivileged user. This may be insecure. For
information on securing this, see
https://docs.ansible.com/ansible/user_guide/become.html#risks-of-becoming-an-
unprivileged-user

TASK [splunk_common : Remove user-seed.conf] ***********************************
ok: [localhost]
Monday 20 December 2021  09:57:03 +0000 (0:00:00.653)       0:18:05.643 *******
included: /opt/ansible/roles/splunk_common/tasks/add_splunk_license.yml for localhost
Monday 20 December 2021  09:57:03 +0000 (0:00:00.228)       0:18:05.871 *******

TASK [splunk_common : Initialize licenses array] *******************************
ok: [localhost]
Monday 20 December 2021  09:57:03 +0000 (0:00:00.082)       0:18:05.954 *******

TASK [splunk_common : Determine available licenses] ****************************
ok: [localhost] => (item=splunk.lic)
Monday 20 December 2021  09:57:03 +0000 (0:00:00.117)       0:18:06.072 *******
included: /opt/ansible/roles/splunk_common/tasks/apply_licenses.yml for localhost => (item=splunk.lic)
Monday 20 December 2021  09:57:03 +0000 (0:00:00.162)       0:18:06.235 *******
Monday 20 December 2021  09:57:03 +0000 (0:00:00.074)       0:18:06.309 *******
Monday 20 December 2021  09:57:03 +0000 (0:00:00.078)       0:18:06.387 *******
Monday 20 December 2021  09:57:03 +0000 (0:00:00.077)       0:18:06.465 *******
included: /opt/ansible/roles/splunk_common/tasks/licenses/add_license.yml for localhost
Monday 20 December 2021  09:57:04 +0000 (0:00:00.141)       0:18:06.606 *******
Monday 20 December 2021  09:57:04 +0000 (0:00:00.079)       0:18:06.686 *******
[WARNING]: Using world-readable permissions for temporary files Ansible needs
to create when becoming an unprivileged user. This may be insecure. For
information on securing this, see
https://docs.ansible.com/ansible/user_guide/become.html#risks-of-becoming-an-
unprivileged-user

TASK [splunk_common : Ensure license path] *************************************
ok: [localhost]
Monday 20 December 2021  09:57:04 +0000 (0:00:00.622)       0:18:07.308 *******
Monday 20 December 2021  09:57:04 +0000 (0:00:00.074)       0:18:07.382 *******
Monday 20 December 2021  09:57:04 +0000 (0:00:00.108)       0:18:07.491 *******
Monday 20 December 2021  09:57:05 +0000 (0:00:00.076)       0:18:07.568 *******
included: /opt/ansible/roles/splunk_universal_forwarder/tasks/../../../roles/splunk_common/tasks/set_as_hec_receiver.yml for localhost
Monday 20 December 2021  09:57:05 +0000 (0:00:00.118)       0:18:07.686 *******
[WARNING]: Using world-readable permissions for temporary files Ansible needs
to create when becoming an unprivileged user. This may be insecure. For
information on securing this, see
https://docs.ansible.com/ansible/user_guide/become.html#risks-of-becoming-an-
unprivileged-user

 

 

 

 

outputs.conf:

 

 

 

[indexAndForward]
index = false

[tcpout]
defaultGroup = default-autolb-group

[tcpout:default-autolb-group]
server = splunk:9997
disabled = false

 

 

 

 

inputs.conf:

 

 

 

[monitor:///docker/py-sandbox/log/*.json]
disabled = 0

[monitor:///docker/fa/log/*.json]
disabled = 0

 

 

 

 

server.conf (i removed the keys/ssl pw, I never created those myself though):

 

 

 

[general]
serverName = synoUniFW
pass4SymmKey = <somekey>

[sslConfig]
sslPassword = <some pw>

[lmpool:auto_generated_pool_forwarder]
description = auto_generated_pool_forwarder
quota = MAX
slaves = *
stack_id = forwarder

[lmpool:auto_generated_pool_free]
description = auto_generated_pool_free
quota = MAX
slaves = *
stack_id = free

 

 

 

 

In the Splunk Enterprise I have opened port 9997 and it actually receives logs, until it doesn't...

What am I doing wrong?

thanks!

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Unfortunately, it's hard to tell because your logs are not splunk logs, just logs from ansible which is used to deploy splunk software within the container (which in itself is a bit of a dumb idea if you ask me but that's a story for another rant). You can check splunkd.log inside the container (by default in splunk they are in /opt/splunk/var/log/splunk/; I have no idea if dockerized splunk uses the same layout).

0 Karma

pmcl77
Loves-to-Learn Lots

Thanks,

I found the logs in /opt/splunkforwarder/var/log/splunk

After enabling more verbose logging, I got this from the ansible/docker logs:

 

TASK [splunk_universal_forwarder : Setup global HEC] ***************************
task path: /opt/ansible/roles/splunk_common/tasks/set_as_hec_receiver.yml:4
fatal: [localhost]: FAILED! => {
    "changed": false,
    "elapsed": 60,
    "redirected": false,
    "status": -1,
    "url": "http://127.0.0.1:8089/services/data/inputs/http/http"
}

MSG:

Status code was -1 and not [200]: Connection failure: timed out

 

 

So, fatal, maybe this is the reason... might in the end be a problem on the splunk enterprise container. Though it's strange because I do receive logs and why should it shut down after a connection timeout?

Will browse more logs...

0 Karma

aasabatini
Motivator

Hi @pmcl77 

 

check the management port 8089

 

I suggest to use the 8089 for the splunk enterprise and change for the uf to the 8090.

 

Regards

 

“The answer is out there, Neo, and it’s looking for you, and it will find you if you want it to.”
0 Karma

pmcl77
Loves-to-Learn Lots

Hi @aasabatini 

 

Thanks, it seems Port 8089 is by default already open on the splunk Enterprise docker image. I also tried to setup a HEC receiver on that port but it would not let me because the port was already busy. I suppose those images are setup by default like this. 

I am enabling more verbose debugging to see if I can find something, or see if it can find me :;)

Best

0 Karma

aasabatini
Motivator

Hi @pmcl77 

the 8089 port is the management port is not used for the HEC.

the HEC port is 8088

please check the documentation

docs.splunk.com/Documentation/Splunk/8.2.3/Data/UsetheHTTPEventCollector

 

“The answer is out there, Neo, and it’s looking for you, and it will find you if you want it to.”
0 Karma

pmcl77
Loves-to-Learn Lots

Hey @aasabatini 

Yes, I read that in the docs already... I don't think it's a port issue, all necessary ports seem open.

>docker ps

99c07b0261e4 splunk/universalforwarder:8.2 "/sbin/entrypoint.sh…" 10 hours ago Up About a minute (healthy) 8088-8089/tcp, 9997/tcp unifw


94a7d4bb28b7 splunk/splunk:8.2 "/sbin/entrypoint.sh…" 13 hours ago Up About a minute (healthy) 8065/tcp, 0.0.0.0:8000->8000/tcp, 8088/tcp, 8191/tcp, 0.0.0.0:8089->8089/tcp, 0.0.0.0:9997->9997/tcp, 9887/tcp splunk

 

0 Karma

aasabatini
Motivator

Hi @pmcl77 

 

can you show the logs present on /opt/splunk/var/log/splunk/splunkd.log

Also can you show me the results of the command  "ls -lrt /opt/"

and the the result of the /opt/splunk/bin/splunk status

 

“The answer is out there, Neo, and it’s looking for you, and it will find you if you want it to.”
0 Karma

pmcl77
Loves-to-Learn Lots

Hi @aasabatini 

I suppose you mean from the universalforwarder, not splunk enterprise?

Since it seems not possible to upload files here and the logs being pretty long I have made them available here  

Also, here is the ouptut:

 

 

[ansible@synoUniFw splunkforwarder]$ ls -lrt /opt/
total 16
drwxr-xr-x 13 splunk  splunk  4096 Dec 17 22:42 splunkforwarder-etc
drwxrwxr-x 16 root    ansible 4096 Dec 18 00:58 ansible
drwxrwxr-x  2 ansible ansible 4096 Dec 21 22:03 container_artifact
drwxr-xr-x 17 splunk  splunk  4096 Dec 21 22:26 splunkforwarder

[ansible@synoUniFw splunkforwarder]$ sudo /opt/splunkforwarder/bin/splunk status
splunkd is running (PID: 1156).
splunk helpers are running (PIDs: 1159).

 

 

 

From the verbose ansible output I can find the following fatal errors. They might both have to do with ssl? However, I have not enabled ssl, so I would assume it should work anyway.

 

 

unifw     | TASK [splunk_common : Test basic https endpoint] *******************************
unifw     | task path: /opt/ansible/roles/splunk_common/tasks/set_certificate_prefix.yml:2
unifw     | fatal: [localhost]: FAILED! => {
unifw     |     "attempts": 60,
unifw     |     "changed": false,
unifw     |     "elapsed": 10,
unifw     |     "failed_when_result": true,
unifw     |     "redirected": false,
unifw     |     "status": -1,
unifw     |     "url": "https://127.0.0.1:8089"
unifw     | }
unifw     |
unifw     | MSG:
unifw     |
unifw     | Status code was -1 and not [200, 404]: Request failed: <urlopen error _ssl.c:1074: The handshake operation timed out>
unifw     | ...ignoring

...


unifw     | TASK [splunk_universal_forwarder : Setup global HEC] ***************************
unifw     | task path: /opt/ansible/roles/splunk_common/tasks/set_as_hec_receiver.yml:4
unifw     | fatal: [localhost]: FAILED! => {
unifw     |     "changed": false,
unifw     |     "elapsed": 60,
unifw     |     "redirected": false,
unifw     |     "status": -1,
unifw     |     "url": "http://127.0.0.1:8089/services/data/inputs/http/http"
unifw     | }
unifw     |
unifw     | MSG:
unifw     |
unifw     | Status code was -1 and not [200]: Connection failure: timed out

 

 

 

Once it's done with the "playbook" the service/container shuts down (exited with code 2).

 

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...