All Apps and Add-ons

What are some best practices for dealing with complex relational database data in Splunk?

grittonc
Contributor

Splunk eats machine data for breakfast, but many of us are using data in Splunk that doesn't come from a machine and isn't easily event-ized.

What are some best practices for dealing with high-volume data from snowflake schemas? This data may change frequently, isn't broken into events, and sometimes requires complex SQL to distill it into events.

Best practices for using DB Connect are most welcome.

0 Karma

woodcock
Esteemed Legend

Here are my best practices for DB Connect.
Do not use v1.
Try to use v3 but expect many problems, some of them insurmountable.
Trust v2 but beware that there is a hardcoded limit that you need to fix (https://answers.splunk.com/answers/233222/splunk-db-connect-2-dbxquery-only-returns-1001-row.html)
Use checkpoints, but try not to use timestamps for this.
Do as much work as possible in SQL (on the DB side).
Don't ingest more than you need; make sure you limit the fields returned.
If things are overly complex, consider creating a custom view inside of your DB and query against that instead of the raw table.

@SloshBurch, we need a validated_best_practice in this area.

ddrillic
Ultra Champion

-- Do as much work as possible in SQL (on the DB side).

This is huge and applies to other software integrations with DBs.

For example, you need a certain type of data-set - create a view that represents this data-set and ingest this data-set, instead of ingesting the raw data and performing the joins within Splunk. In Hunk, with huge data-sets these scenarios were nightmares until we created the proper views.

sloshburch
Ultra Champion

You rang? lol

I guess I want to know more about the situation here. I'm not familiar enough with database data that has changing schema. I need to appreciate that to get my head around the challenge.

0 Karma

grittonc
Contributor

It's not the schema that is changing, it's the data. Updates and deletes are not Splunk-friendly. If I've already indexed an event related to entity X and then something about X changes, I need to index a new event for entity X. The old one isn't relevant anymore for most purposes. That means that either users have to search for the latest version of that event, or I need to find a way to delete the old version that is out of date.

Sometimes I use lookup tables instead of indexes. I've also looked at using scheduled searches to do the heavy lifting of finding the latest version of each entity and then having dashboards use loadjob. But end users trying to use the traditional "index=foo" in the search box can easily come up with incorrect conclusions.

0 Karma

sloshburch
Ultra Champion

Do you retain a timestamp as a field with a row that is inserted or deleted? If you do then DBConnect could use a cursor follow on a query with ORDER BY of that timestamp field. Then the data is loaded in splunk as a new event and reporting on it uses latest() of a transforming statistics command.

I'm not sure at this time how to do it without that. I think your approach of using a lookup file to cache it is sound as well. But obv depends on the volume of data.

0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...