Splunk Cloud Platform

Integrating Splunk with Databricks

sugata
New Member

i have my databricks setup in aws which runs multiple ETL pipelines. i want to send logs, metrices, application flow tracker etc. in splunk. i am not sure on how this can be achieved. i have my organisation splunk setup where i can generate my auth token and can see the endpoint details. whether this is enough to push data from databricks to splunk or i need to have open telemetry alike collector which will read the data stored in databricks /some/location and push them to splunk?

Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @sugata 

I dont think Databricks has a specific Splunk connector as such, but I did work with Databricks and sending its own logs to Splunk in a previous life... 

How are you running Databricks? You might find the easiest way is to run a Splunk Universal Forwarder to send the specific log files from the Databricks worker nodes to your Splunk environment.

There is also the Databricks Add-on for Splunk app on Splunkbase but this is more designed to run queries against Databricks and/or trigger jobs, although this could also be used to gather telemetry.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

sugata
New Member

Thanks for your reply @livehybrid 
The add-on that you mentioned is good for querying databricks (sending a command TO databricks) but I am looking for a solution which can send logs FROM databricks.
Example - am building a 10 steps ETL pipeline in databricks, which is hosted in aws. At the end of the each step, i need to write a log in splunk aout its success/failure. I have a schema defined for the log. So my question is how to send that event/log into splunk, which is hosted somewhere else and not in aws. 

I feel splunk might have somekind of API exposed for that. just dont know which API, how to call, how to configure, what are the best practices etc.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

I have client which is sending events via databricks to splunk. 
As already said you should use HEC for sending those events.

But it needs some configurations inside Databricks to manage those streams or read those again from data storages. Then you must define how many events you send at time or otherwise it's big risk that it try to send too much (which leads crash of splunk as OoM killer).

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Ah @sugata  okay in that case if you probably want to look at the Splunk HTTP Event Collector (HEC) which allows you to send events to a specific index within Splunk from an external service. Check out https://help.splunk.com/en/splunk-enterprise/get-started/get-data-in/9.4/get-data-with-http-event-co... and https://dev.splunk.com/enterprise/docs/devtools/httpeventcollector/ for more information on how to use this, setting it up and formatting with examples.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...