Solved: Splunk DB Connect 3 Extremely Slow

beidem · ‎11-18-2019

Running a simple command using |dbxquery is taking 80+ seconds when connecting to a SQL instance, |dbquery takes around 0.8 seconds to complete. Running Splunk version 7.2.6, Splunk DB Connect 3.1.3, and SQL Server 2017...all on Linux. I have the exact same configuration in a development environment and |dbxquery takes around 1.2 seconds to run there.

I'm just returning the current utc date in this command and it completed in 83 seconds.
| dbxquery connection="mydatabase" query="select getutcdate()"

This search has completed and has returned 1 results by scanning 0 events in 83.165 seconds

Duration (seconds) Component Invocations Input count Output count
5.12 command.dbxquery 1 - 1
0.00 dispatch.createdSearchResultInfrastructure 1 - -
31.03 dispatch.evaluate.dbxquery 2 - -
0.00 dispatch.finalWriteToDisk 1 - -
0.00 dispatch.preview.snapshot 2 - -
0.02 dispatch.writeStatus 7 - -
0.09 startup.configuration 2 - -
0.01 startup.handoff 2 - -

The search.log shows that most of the time is being spent during optimize_toJson, this gets called a couple of times and the 15.5 second delay matches up to the difference in each query execution.

11-18-2019 18:57:56.589 INFO AstOptimizer - SrchOptMetrics optimize_toJson=15.513820102
11-18-2019 18:58:12.200 INFO AstOptimizer - SrchOptMetrics optimize_toJson=15.599384700

Things that I have inspected so far:
1. Ensured antivirus and stat collecting apps aren't blocking or delaying java processes.
2. The instance size is properly sized, I went ahead and doubled the resources recently with no change in speed.
3. Disabled search optimization (|noop search_optimization=false)

beidem · ‎11-18-2019

We believe this was a result of an incorrect DNS lookup in the /etc/resolv.conf file on the Splunk search head. When we executed |dbxquery it was most likely attempting to resolve the SQL instance using the first nameserver which was invalid, it timed out after some period of time, fell back to the second nameserver which was valid, and then finally completed the query after 80 seconds or so.

It was a really odd case because |dbquery was running fast the entire time and we even tried the IP address for the SQL instance and still encountered the long delay.

View solution in original post

beidem · ‎11-18-2019

We believe this was a result of an incorrect DNS lookup in the /etc/resolv.conf file on the Splunk search head. When we executed |dbxquery it was most likely attempting to resolve the SQL instance using the first nameserver which was invalid, it timed out after some period of time, fell back to the second nameserver which was valid, and then finally completed the query after 80 seconds or so.

It was a really odd case because |dbquery was running fast the entire time and we even tried the IP address for the SQL instance and still encountered the long delay.

Splunk DB Connect 3 Extremely Slow

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25

Are you a member of the Splunk Community?

Splunk DB Connect 3 Extremely Slow

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25