I was curious to see if there are any best practices for mapping to CIM data models. More specifically, I'm looking for some guidelines on when (not) to map a certain field to a datamodel.
Of course I can map all fields to the default inherited and calculated fields of the data model. But what about fields that are not present in the data model by default? Should you create a calculated field in the data model for every calculation in your search? Or should you leave the data model as default as possible and leave the calculations in your search?
In other words, I have a search that calculates a large number of extra fields through evals and lookups. I want to speed up and generalize this search by mapping to a CIM data model. Which fields should I leave in the search (after tstats) and which fields should I map to the data model (so that I can retrieve them with tstats)? Should I add calculated fields to the data model for my extra fields, so that I can retrieve all details through a single tstats command? Alternatively, should I leave the data model as default as possible and calculate the fields in the search (after the tstats command)?
Thank you for any help you can offer!
CIM is used for data normalisation to extract information from the raw data. Check this conf talk
Also this link from Splunk docs
The use case you've defined above seems specific to a single use and would create all these unnecessary fields in the CIM for the model. Would recommend exploring summary indexing first for this use case and running it as a saved search on a cron.
My use case is not specific to a single use. I want to map antivirus data to the Malware data model. I want to do this in such a way that my searches work on data for any antivirus vendor.
I would like to add fields that are valid for every type of data and vendor. However, I'm looking for some guidelines on when to add fields to the datamodel and when to add them to your search itself.
Are you sure that
Malware is the correct
data model because
Intrusion Detection is very similar and maybe is a better fit? In any case, do EVERYTHING that you can to NOT edit/change the
data model definition because this will cause you a great deal of grief when upgrading. With this primary directive in mind, you have 2 basic options:
The BAD one: Hijack an existing but unused field such as one of the `bunit_*` fields which are often not used. The GOOD one: Create a custom `tag` value (you will have to update the `whitelist` for the `datamodel`) because all `CIM data models` contain this field already.
Expanding on the latter, you might, for example, like to classify the
dest value as either
external. So you would create a global
automatic lookup against a
lookup definition that creates a field called
dest_type with 2 possible values:
external. Then you would create 2
tag definitions; 1 called
dest_is_internal defined as
dest_type="internal", the other called
dest_is_external defined as
This is the advice I was looking for, thank you! However, I'm sure that I need the
data model and not