Solved: Splunk DB Connect 3: Why am I getting HEC 503 Erro...

Muryoutaisuu · ‎01-05-2018

Hello guys

We have a dedicated heavyforwarder instance, that is used as a database connector with the App splunk_app_db_connect. We have about 100 enabled inputs and the very most of them run every 60 seconds on follow tail basis. On most inputs max_row is configured to 10000000, but usually the returned number of events is nowhere near that limit.

We are getting these irregularly recurring error messages in /opt/splunk/var/log/splunk/splunk_app_db_connect_server.log from multiple (3-10) inputs at the same time:

2018-01-05 10:15:17.509 +0100  [QuartzScheduler_Worker-30] ERROR org.easybatch.core.job.BatchJob - Unable to write records
java.io.IOException: HTTP Error 503: Service Unavailable
    at com.splunk.dbx.server.dbinput.recordwriter.HttpEventCollector.uploadEventBatch(HttpEventCollector.java:112)
    at com.splunk.dbx.server.dbinput.recordwriter.HttpEventCollector.uploadEvents(HttpEventCollector.java:89)
    at com.splunk.dbx.server.dbinput.task.processors.HecEventWriter.writeRecords(HecEventWriter.java:48)
    at org.easybatch.core.job.BatchJob.writeBatch(BatchJob.java:203)
    at org.easybatch.core.job.BatchJob.call(BatchJob.java:79)
    at org.easybatch.extensions.quartz.Job.execute(Job.java:59)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)

The stacktrace indicates that the issue may lie within HEC (HttpEventCollector). Is it possible, that HEC is overloaded? HEC is currently configured with the default settings.

In splunk_httpinput/default/inputs.conf we saw following options:

[http]
dedicatedIoThreads=2
maxThreads = 0
maxSockets = 0

But in the documentation on http://dev.splunk.com/view/event-collector/SP-CAAAE6Q#httpstanza I found the following for dedicatedIoThreads:

The number of dispatcher threads on the HTTP Event Collector server. The default value is 2. This setting should not be altered unless you have been requested to do so by Splunk Support. The value of this parameter should never be more than the number of physical CPU cores on your Splunk Enterprise server.

May somebody pinpoint me, whether setting the option dedicatedIoThreads higher may indeed resolve my problem? Sadly we get those error only on our productive platform and we'd rather not tinker with options that shouldn't be changed without instructions.

Version notes:
Splunk version: 6.6.3 (build e21ee54bc796)
splunk_app_db_connect version: 3.1.0

Muryoutaisuu · ‎07-11-2018

Upgrading to Version 3.1.3 seems to have resolved this issue in our case

View solution in original post

Muryoutaisuu · ‎07-11-2018

Upgrading to Version 3.1.3 seems to have resolved this issue in our case

Tbmaness · ‎07-02-2018

Not sure if your issue has been resolved, but this is what worked for me. I was having this exact issue with my implementation and the problem turned out to be "File Ownership". I noticed all my application were running as the "Splunk" user and "./splunk_app_db_connect" had root as the owner. Change ownership of the entire app directory to match all of your other apps.

For example: I ran 'chown -R splunk:splunk ./splunk_app_db_connect' on the directory to assign splunk as owner of all the files. Obviously assign whichever user you use in your implementation.

dvergnes_splunk · ‎02-07-2018

Hi,

503 means that HEC queue to send to the indexers is full. The problem is to identify where is the bottleneck:
1) is it the heavy forwarder?
2) is the network between the heavy forwarder and the indexers?
3) is it the indexers?

To help you diagnose, you can check following things:
- CPU usage on heavy forwarder and indexers
- queue size on indexers

If the CPU is high (more than 90%) in one of the component, that's where you should focus to troubleshoot. If everything seems normal it might be a network issue.

Finally, another test you can have to troubleshoot is to index on the heavy forwarder locally. After doing so, there is no 503 anymore, the bottleneck is clearly downstream (network or indexers).

harsmarvania57 · ‎01-05-2018

I'll suggest to open support case with splunk as this error related to HEC in DB Connect app not separate HEC.

p_gurav · ‎01-05-2018

Hi Muryoutaisuu,

Please refer below link:

https://answers.splunk.com/answers/525193/splunk-db-connect-3-how-to-resolve-http-error-503.html

Muryoutaisuu · ‎01-05-2018

Hi p_gurav
That's not the issue. We haven't configured the HEC to run as if on a deploymentserver.

$ grep useDeploymentServer /opt/splunk/etc/apps/splunk_httpinput/*/inputs.conf
/opt/splunk/etc/apps/splunk_httpinput/default/inputs.conf:useDeploymentServer=0

Splunk DB Connect 3: Why am I getting HEC 503 Errors?

Index This | How many sides does a circle have?

New This Month - Splunk Observability updates and improvements for faster ...

What's New in Splunk Cloud Platform 9.3.2411?