Getting Data In

Choose correct number of indexes for good performance.

fvarela
Explorer

Hello Splunk community,

Let's say my input to Splunk is three csv files that use the following schema. Each csv populate an index: Faults, Incidents and Status

fvarela_1-1633791742955.png

For each Faults entry there is one (and just one) Status entry. That Status entry will have parent_id = id of that fault.

In the same way there is also a 'Status' entry for each Incident.

When I am querying Splunk or making dashboards I have to retrieve information not only from 'Faults' or 'Incident' indexes but also from 'Status'. That makes me use a lot of joining indexes queries like this:

 

index="faults"
|join type=outer status_id [search index="status" | rename id as status_id]

 

I liked this solution at first because 'Faults' and 'Incident' indexes look very clean, but I have read that these types of SPL queries are computational expensive and I am concerned that perhaps this will not escalate well in the future.

Should I perhaps modify the schema and remove the Status index and put all that information in the Faults and Incidents like this?

fvarela_2-1633791896527.png

Thank you all a lot in advance for your answers.

Fran

 

Labels (2)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Firstly, you're thinking about your indexes in terms of a relational database.

You usually don't need multiple indexes unless you need:

1) Different retention periods or

2) Different permissions

to the data stored in those indexes.

You can fit many different types of data into a single index. They might have different sourcetypes (and be parsed and interpreted differently), they might come from different sources and hosts.

It's true that operations involving subsearches and stuff like joins are "heavy" on the splunk. And again - splunk is not a relational database, you don't need to normalize your data here. On the contrary - the more information you have within a single event and the less you have to "reach out" to other objects, the better.

So your "denormalization" is a sound idea.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Firstly, you're thinking about your indexes in terms of a relational database.

You usually don't need multiple indexes unless you need:

1) Different retention periods or

2) Different permissions

to the data stored in those indexes.

You can fit many different types of data into a single index. They might have different sourcetypes (and be parsed and interpreted differently), they might come from different sources and hosts.

It's true that operations involving subsearches and stuff like joins are "heavy" on the splunk. And again - splunk is not a relational database, you don't need to normalize your data here. On the contrary - the more information you have within a single event and the less you have to "reach out" to other objects, the better.

So your "denormalization" is a sound idea.

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...