Solved: CIM Data Model Best Practices - What (not) to incl...

thomasvanhelden · ‎02-24-2020

Hello,

I was curious to see if there are any best practices for mapping to CIM data models. More specifically, I'm looking for some guidelines on when (not) to map a certain field to a datamodel.

Of course I can map all fields to the default inherited and calculated fields of the data model. But what about fields that are not present in the data model by default? Should you create a calculated field in the data model for every calculation in your search? Or should you leave the data model as default as possible and leave the calculations in your search?

In other words, I have a search that calculates a large number of extra fields through evals and lookups. I want to speed up and generalize this search by mapping to a CIM data model. Which fields should I leave in the search (after tstats) and which fields should I map to the data model (so that I can retrieve them with tstats)? Should I add calculated fields to the data model for my extra fields, so that I can retrieve all details through a single tstats command? Alternatively, should I leave the data model as default as possible and calculate the fields in the search (after the tstats command)?

Thank you for any help you can offer!

woodcock · ‎03-09-2020

Are you sure that Malware is the correct data model because Intrusion Detection is very similar and maybe is a better fit? In any case, do EVERYTHING that you can to NOT edit/change the data model definition because this will cause you a great deal of grief when upgrading. With this primary directive in mind, you have 2 basic options:

The BAD one: Hijack an existing but unused field such as one of the `bunit_*` fields which are often not used.
The GOOD one: Create a custom `tag` value (you will have to update the `whitelist` for the `datamodel`) because all `CIM data models` contain this field already.

Expanding on the latter, you might, for example, like to classify the dest value as either internal or external. So you would create a global automatic lookup against a CIDR-based lookup definition that creates a field called dest_type with 2 possible values: internal or external. Then you would create 2 tag definitions; 1 called dest_is_internal defined as dest_type="internal", the other called dest_is_external defined as dest_type="external".

View solution in original post

woodcock · ‎03-09-2020

Are you sure that Malware is the correct data model because Intrusion Detection is very similar and maybe is a better fit? In any case, do EVERYTHING that you can to NOT edit/change the data model definition because this will cause you a great deal of grief when upgrading. With this primary directive in mind, you have 2 basic options:

The BAD one: Hijack an existing but unused field such as one of the `bunit_*` fields which are often not used.
The GOOD one: Create a custom `tag` value (you will have to update the `whitelist` for the `datamodel`) because all `CIM data models` contain this field already.

Expanding on the latter, you might, for example, like to classify the dest value as either internal or external. So you would create a global automatic lookup against a CIDR-based lookup definition that creates a field called dest_type with 2 possible values: internal or external. Then you would create 2 tag definitions; 1 called dest_is_internal defined as dest_type="internal", the other called dest_is_external defined as dest_type="external".

thomasvanhelden · ‎03-09-2020

This is the advice I was looking for, thank you! However, I'm sure that I need the Malware data model and not Intrusion Detection.

anmolpatel · ‎02-26-2020

CIM is used for data normalisation to extract information from the raw data. Check this conf talk

https://conf.splunk.com/files/2017/slides/the-power-of-data-normalization-a-look-at-cim-under-the-ho...

Also this link from Splunk docs
https://docs.splunk.com/Documentation/Splunk/8.0.2/Knowledge/Acceleratedatamodels

The use case you've defined above seems specific to a single use and would create all these unnecessary fields in the CIM for the model. Would recommend exploring summary indexing first for this use case and running it as a saved search on a cron.

thomasvanhelden · ‎03-09-2020

My use case is not specific to a single use. I want to map antivirus data to the Malware data model. I want to do this in such a way that my searches work on data for any antivirus vendor.
I would like to add fields that are valid for every type of data and vendor. However, I'm looking for some guidelines on when to add fields to the datamodel and when to add them to your search itself.

anmolpatel · ‎03-09-2020

@thomasvanhelden please look at @woodcock's response below

CIM Data Model Best Practices - What (not) to include in the data model?

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

SignalFlow: What? Why? How?