Hello,
I was curious to see if there are any best practices for mapping to CIM data models. More specifically, I'm looking for some guidelines on when (not) to map a certain field to a datamodel.
Of course I can map all fields to the default inherited and calculated fields of the data model. But what about fields that are not present in the data model by default? Should you create a calculated field in the data model for every calculation in your search? Or should you leave the data model as default as possible and leave the calculations in your search?
In other words, I have a search that calculates a large number of extra fields through evals and lookups. I want to speed up and generalize this search by mapping to a CIM data model. Which fields should I leave in the search (after tstats) and which fields should I map to the data model (so that I can retrieve them with tstats)? Should I add calculated fields to the data model for my extra fields, so that I can retrieve all details through a single tstats command? Alternatively, should I leave the data model as default as possible and calculate the fields in the search (after the tstats command)?
Thank you for any help you can offer!
Are you sure that Malware
is the correct data model
because Intrusion Detection
is very similar and maybe is a better fit? In any case, do EVERYTHING that you can to NOT edit/change the data model
definition because this will cause you a great deal of grief when upgrading. With this primary directive in mind, you have 2 basic options:
The BAD one: Hijack an existing but unused field such as one of the `bunit_*` fields which are often not used.
The GOOD one: Create a custom `tag` value (you will have to update the `whitelist` for the `datamodel`) because all `CIM data models` contain this field already.
Expanding on the latter, you might, for example, like to classify the dest
value as either internal
or external
. So you would create a global automatic lookup
against a CIDR
-based lookup definition
that creates a field called dest_type
with 2 possible values: internal
or external
. Then you would create 2 tag definitions
; 1 called dest_is_internal
defined as dest_type="internal"
, the other called dest_is_external
defined as dest_type="external"
.
Are you sure that Malware
is the correct data model
because Intrusion Detection
is very similar and maybe is a better fit? In any case, do EVERYTHING that you can to NOT edit/change the data model
definition because this will cause you a great deal of grief when upgrading. With this primary directive in mind, you have 2 basic options:
The BAD one: Hijack an existing but unused field such as one of the `bunit_*` fields which are often not used.
The GOOD one: Create a custom `tag` value (you will have to update the `whitelist` for the `datamodel`) because all `CIM data models` contain this field already.
Expanding on the latter, you might, for example, like to classify the dest
value as either internal
or external
. So you would create a global automatic lookup
against a CIDR
-based lookup definition
that creates a field called dest_type
with 2 possible values: internal
or external
. Then you would create 2 tag definitions
; 1 called dest_is_internal
defined as dest_type="internal"
, the other called dest_is_external
defined as dest_type="external"
.
This is the advice I was looking for, thank you! However, I'm sure that I need the Malware
data model
and not Intrusion Detection
.
CIM is used for data normalisation to extract information from the raw data. Check this conf talk
Also this link from Splunk docs
https://docs.splunk.com/Documentation/Splunk/8.0.2/Knowledge/Acceleratedatamodels
The use case you've defined above seems specific to a single use and would create all these unnecessary fields in the CIM for the model. Would recommend exploring summary indexing first for this use case and running it as a saved search on a cron.
My use case is not specific to a single use. I want to map antivirus data to the Malware data model. I want to do this in such a way that my searches work on data for any antivirus vendor.
I would like to add fields that are valid for every type of data and vendor. However, I'm looking for some guidelines on when to add fields to the datamodel and when to add them to your search itself.
@thomasvanhelden please look at @woodcock's response below