Hello Splunk community,
Let's say my input to Splunk is three csv files that use the following schema. Each csv populate an index: Faults, Incidents and Status
For each Faults entry there is one (and just one) Status entry. That Status entry will have parent_id = id of that fault.
In the same way there is also a 'Status' entry for each Incident.
When I am querying Splunk or making dashboards I have to retrieve information not only from 'Faults' or 'Incident' indexes but also from 'Status'. That makes me use a lot of joining indexes queries like this:
index="faults"
|join type=outer status_id [search index="status" | rename id as status_id]
I liked this solution at first because 'Faults' and 'Incident' indexes look very clean, but I have read that these types of SPL queries are computational expensive and I am concerned that perhaps this will not escalate well in the future.
Should I perhaps modify the schema and remove the Status index and put all that information in the Faults and Incidents like this?
Thank you all a lot in advance for your answers.
Fran
Firstly, you're thinking about your indexes in terms of a relational database.
You usually don't need multiple indexes unless you need:
1) Different retention periods or
2) Different permissions
to the data stored in those indexes.
You can fit many different types of data into a single index. They might have different sourcetypes (and be parsed and interpreted differently), they might come from different sources and hosts.
It's true that operations involving subsearches and stuff like joins are "heavy" on the splunk. And again - splunk is not a relational database, you don't need to normalize your data here. On the contrary - the more information you have within a single event and the less you have to "reach out" to other objects, the better.
So your "denormalization" is a sound idea.
Firstly, you're thinking about your indexes in terms of a relational database.
You usually don't need multiple indexes unless you need:
1) Different retention periods or
2) Different permissions
to the data stored in those indexes.
You can fit many different types of data into a single index. They might have different sourcetypes (and be parsed and interpreted differently), they might come from different sources and hosts.
It's true that operations involving subsearches and stuff like joins are "heavy" on the splunk. And again - splunk is not a relational database, you don't need to normalize your data here. On the contrary - the more information you have within a single event and the less you have to "reach out" to other objects, the better.
So your "denormalization" is a sound idea.