Splunk Search

Should I use root events, root transactions, or root searches to set up my data model?

_jgpm_
Communicator

I apologize in advance for the super broad question and I realize that the answer may depend heavily on the structure of my data.

I have constructed several dashboards and I am fairly well pleased with how things have turned out. However, I would like to start utilizing data models as abstracting data structure is the way to go for my future vision. I have read:

  • conf2014_DavidClawson_Splunk_how to actually use data models
  • Learn How to Design, Build and Manage Data Models
  • Splunk-6.4.3-SearchReference-Datamodel

I am at a loss those on how to start. I have played around with the DM wizard and created several pivot charts but how to optimize the data hierarchy seems unclear to me.

My data is summarized as a view of many separate processes running in parallel over time. Some processes have hierarchical relationships with others but not all. Reporting from various processes can be redundant and I would like to capture and visualize data integrity, variation, and flow across reporting processes.

Mapping out all these relationships would take an enormous amount of time. My goal is to only view the most important changes over time to identify system behavior and determine root cause quickly.

Is is arbitrary which root concept to start creating? Is there an ability to refactor the data models after obviously poor data models are constructed? Which root-concept-centric model is the best ft for many types of data models?

Thank you in advance for any help.

lguinn2
Legend

I would start by using root events first. Root event searches can be accelerated, while search-based objects and transaction-based objects cannot. And I would definitely accelerate the data model, which will provide a big performance boost.

Also very important: write the most efficient searches that you can for the event objects. Always use the index name in the search. Also use the host and sourcetype if at all possible. Avoid wildcards if you can - especially wildcards at the beginning of a search term. (For example: user=admin* is not great, but user=*_db is terrible.)

Refactoring can be hard after the fact, if you change the underlying model and you have a lot of saved searches that used it. OTOH, if most of your usage is ad-hoc, or if the structure of the model does not change much, it is not as big a deal...

_jgpm_
Communicator

Thank you. Given your recommendation on efficient searches, how would you suggest I solve this problem. I was following this advice, "Reverse‐engineer your existing dashboards and searches into data models," and trying to make higher level concepts with children events that have inherited constraints.

Let's say process A reports 10 apples. Process B listens to process A and then echos the report of 10 apples but using different formatting. Process C and D listen to B and then report again different variants of the same message.

I would like to capture all the various reports of the 10 apples to determine quality of process reporting. But how would I setup of root event? If I make process A the root, then process B, C, and D would have a completely different set of constraints to make the search efficient, e.g. reduce the data set to a minimum size. There is no inheritance that is obvious.

In my searches, I would
eval AppleReport=coalesce(process A, etc.)

and then perform my stats/chart command on AppleReport. But this would structure the root event in reverse, correct? AppleReport is the root event with a very broad query/constraints and its children would inherit the broad query and add their own constraints.

Or am I completely misunderstanding the design theory? I admit I know very little about data architecture or database fundamentals.

0 Karma

cwilmoth
Path Finder

Iguinn,
From the Knowledge Manager manual:

Datasets can only be accelerated if they contain at least one root event hierarchy or one **root search hierarchy that only includes streaming commands. Dataset hierarchies based on root search datasets that include nonstreaming commands and root transaction datasets are not accelerated

Doesn't that mean that root search datasets can in fact be accelerated? I am asking because it does not look like the CIM Malware Operations dataset is accelerating for us (which would agree with your previous statement), but the manual seems to imply that it should.

Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...