Solved: Data Model - Explanation

AliMaher · ‎07-04-2024

Hi,

I hope all is well.

I have struggled with Data Model Concept as I seek to know why and When we use the data model and how it increases the performance?

I am fine with it's structured data and has three type of data sets, also I am able to create it as How To.

But why use it? When use it? what is the main idea behind it?

gcusello · ‎07-04-2024

Hi @AliMaher ,

in internet and on the YouTube Splunk Channel, you can find many videos or documents to describe what are Data Models and how and why use them, like the following:

https://www.youtube.com/watch?v=WBzKUYAfGsk

https://www.youtube.com/watch?v=n0HPe175k24

https://docs.splunk.com/Documentation/Splunk/9.2.1/Knowledge/Aboutdatamodels

and so on ...

Anyway summarizing, aData Models is a database containing structured and normalized (this is the password of the concept!) data that you can use, only for structured searches (there's no sense to put _raw on the DMs!) to have faster searches.

This means that you have always to choose add-ons CIM compliant, and if you have custom add-ons, you have to normalize them.

then you can run your searches having very faster results.

Then You can have still faster results using DM Acceleration.

When to use DMs?

you should use DMs all the times that you have to perform a search on structured data, in other words when you have to perform a search "field=value" on normalized data.

You cannot use DMs if you have to run a search on free text (as usual in Splunk) or on not normalized data; in this second case, my hint is to normalize your data and use DMs.

DMs give you another advantage: you can run a search on very heterogeneous data on the same DM, e.g.: if you're searching for a failed login, you should run a search on many indexes with different contraints (e.g. 4625 in windows), instead you can run a search on a DM only using the correct one and you'll have the failed login for all the data you have.

Last information: identify the DMs that you need to use and accelerate them, but only the ones that you have to use to avoid to consume unuseful resources.

Ciao.

Giuseppe

View solution in original post

PickleRick · ‎07-06-2024

One more thing because that's often overlooked when talking about DMs.

DMs as such don't accelerate anything. DMs are just an intermediate layer of logic making Splunk able to search different types of data in the same way so when you search from DM using DM fields constraints, Splunk "underneath" transforms your search into raw data search and lets you search possibly multiple separate indexes and sourcetypes without even knowing the real structure of the underlying data.

DM _acceleration_ however is a completely different beast. It's the machinery that's running under Splunk's hood and prepares this database of indexed datamodel contents so that you can search using those pre-built summaries instead of digging through the raw data itself.

So while the DMA requires properly ingested and configured data normalized for DMs, it's this "one step beyond" that gives you performance benefits. If you just have DMs which are not accelerated you might be able to search your data easier (and create pivots) but it will not give you any performance gains. It's the DM acceleration that makes Splunk go zzzzooooooom.

AliMaher · ‎07-05-2024

Thanks for the great explanation, really appreciated!

isoutamo · ‎07-06-2024

Here is link to CIM (Splunk Common Information Model) https://docs.splunk.com/Documentation/CIM/latest/User/Overview. By following it you can easily utilize create only once dashboard / report etc. and just add a new data sources and then those will be shown there.

gcusello · ‎07-06-2024

Hi @AliMaher ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated also by the others contributors 😉

gcusello · ‎07-04-2024

Hi @AliMaher ,

in internet and on the YouTube Splunk Channel, you can find many videos or documents to describe what are Data Models and how and why use them, like the following:

https://www.youtube.com/watch?v=WBzKUYAfGsk

https://www.youtube.com/watch?v=n0HPe175k24

https://docs.splunk.com/Documentation/Splunk/9.2.1/Knowledge/Aboutdatamodels

and so on ...

Anyway summarizing, aData Models is a database containing structured and normalized (this is the password of the concept!) data that you can use, only for structured searches (there's no sense to put _raw on the DMs!) to have faster searches.

This means that you have always to choose add-ons CIM compliant, and if you have custom add-ons, you have to normalize them.

then you can run your searches having very faster results.

Then You can have still faster results using DM Acceleration.

When to use DMs?

you should use DMs all the times that you have to perform a search on structured data, in other words when you have to perform a search "field=value" on normalized data.

You cannot use DMs if you have to run a search on free text (as usual in Splunk) or on not normalized data; in this second case, my hint is to normalize your data and use DMs.

DMs give you another advantage: you can run a search on very heterogeneous data on the same DM, e.g.: if you're searching for a failed login, you should run a search on many indexes with different contraints (e.g. 4625 in windows), instead you can run a search on a DM only using the correct one and you'll have the failed login for all the data you have.

Last information: identify the DMs that you need to use and accelerate them, but only the ones that you have to use to avoid to consume unuseful resources.

Ciao.

Giuseppe

Data Model - Explanation

alias

data model

summary indexing

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)