i have my databricks setup in aws which runs multiple ETL pipelines. i want to send logs, metrices, application flow tracker etc. in splunk. i am not sure on how this can be achieved. i have my organisation splunk setup where i can generate my auth token and can see the endpoint details. whether this is enough to push data from databricks to splunk or i need to have open telemetry alike collector which will read the data stored in databricks /some/location and push them to splunk?
Hi @sugata
I dont think Databricks has a specific Splunk connector as such, but I did work with Databricks and sending its own logs to Splunk in a previous life...
How are you running Databricks? You might find the easiest way is to run a Splunk Universal Forwarder to send the specific log files from the Databricks worker nodes to your Splunk environment.
There is also the Databricks Add-on for Splunk app on Splunkbase but this is more designed to run queries against Databricks and/or trigger jobs, although this could also be used to gather telemetry.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Thanks for your reply @livehybrid
The add-on that you mentioned is good for querying databricks (sending a command TO databricks) but I am looking for a solution which can send logs FROM databricks.
Example - am building a 10 steps ETL pipeline in databricks, which is hosted in aws. At the end of the each step, i need to write a log in splunk aout its success/failure. I have a schema defined for the log. So my question is how to send that event/log into splunk, which is hosted somewhere else and not in aws.
I feel splunk might have somekind of API exposed for that. just dont know which API, how to call, how to configure, what are the best practices etc.
I have client which is sending events via databricks to splunk.
As already said you should use HEC for sending those events.
But it needs some configurations inside Databricks to manage those streams or read those again from data storages. Then you must define how many events you send at time or otherwise it's big risk that it try to send too much (which leads crash of splunk as OoM killer).
Ah @sugata okay in that case if you probably want to look at the Splunk HTTP Event Collector (HEC) which allows you to send events to a specific index within Splunk from an external service. Check out https://help.splunk.com/en/splunk-enterprise/get-started/get-data-in/9.4/get-data-with-http-event-co... and https://dev.splunk.com/enterprise/docs/devtools/httpeventcollector/ for more information on how to use this, setting it up and formatting with examples.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing