LOOK FOR BOLD for quick overview:
I want to control the index-time extraction for events linked to an accelerated data model...
I am relatively new to Splunk, and recently I've jumped into Accelerated Data Models. I understand a number of aspects about them already:
What I don't understand is how those summaries for the Accelerated Data Models are built. I understand that ADMs use tsidx files as the summaries of the raw data.
"Each search you run scans tsidx files for the search keywords and uses their location references to retrieve from the rawdata file the events to which those keywords refer. Splunk Enterprise creates a separate set of tsidx files for data model acceleration. In this case, it uses the tsidx files as summaries of the data returned by the data model."
What I don't understand is how the connection to the raw data and the .tsidx files is made. How are the .tsidx files formed from the event data?
When I look at the data models object hierarchy in settings I see the fields that it encompasses:
When I do a search like:
| datamodel Intrusion_Detection search
If I'm correct, it is giving me the search time extraction from indexes related to the accelerated model.
The problem is that I get a lot of fields that are useless in cyber security efforts. For instance, maybe I want to know the category of the different attacks that are occurring. It is a calculated field in my accelerated data model. The calculation goes - if( isnull(category) OR category="","unknown",category. This means it will return the category unless there is none. I also don't understand where it gets this variable "category". How is that being pulled from the raw data?
I get 100% unknowns is the problem.
Is this a problem of event tagging with the Common Information Model or somewhere else in the flow of ingested data? - https://wiki.splunk.com/images/4/45/Splunk_EventProcessing_v19_0_standalone.pdf
In the end here is what I want to know to fix this:
Additionally I understand that making more fields to pull out of the data also means for an increase in storage size on the indexer. I just want to figure this all out.😁
[EDIT]
Is there where I would use the App: Splunk Add-on Builder?
The datamodels are accelerated by splunk building and incrementally updating the summaries. The summaries are built by scheduled searches spawned by scheduler. By itself it has nothing to do with index-time extractions.
And about the various fields and so on - it's up to your admin (or data admin if you have separate role for this) to make your data CIM-compliant. CIM app on its own doesn't "do" anything. It just provides you with a schema to fill (you can compare it to an abstract classes in programming). It's the same schema for everyone using the app hence the name Common Information Model.
But now after you installed the CIM app you must make sure your data is properly matched to what CIM expects - must make sure that appropriate fields are calculated if they are not in your raw data and the events are tagged properly. With many TA-s it's done automatically by the app for given sourcetypes.