Hi,
I hope all is well.
I have struggled with Data Model Concept as I seek to know why and When we use the data model and how it increases the performance?
I am fine with it's structured data and has three type of data sets, also I am able to create it as How To.
But why use it? When use it? what is the main idea behind it?
Hi @AliMaher ,
in internet and on the YouTube Splunk Channel, you can find many videos or documents to describe what are Data Models and how and why use them, like the following:
https://www.youtube.com/watch?v=WBzKUYAfGsk
https://www.youtube.com/watch?v=n0HPe175k24
https://docs.splunk.com/Documentation/Splunk/9.2.1/Knowledge/Aboutdatamodels
and so on ...
Anyway summarizing, aData Models is a database containing structured and normalized (this is the password of the concept!) data that you can use, only for structured searches (there's no sense to put _raw on the DMs!) to have faster searches.
This means that you have always to choose add-ons CIM compliant, and if you have custom add-ons, you have to normalize them.
then you can run your searches having very faster results.
Then You can have still faster results using DM Acceleration.
When to use DMs?
you should use DMs all the times that you have to perform a search on structured data, in other words when you have to perform a search "field=value" on normalized data.
You cannot use DMs if you have to run a search on free text (as usual in Splunk) or on not normalized data; in this second case, my hint is to normalize your data and use DMs.
DMs give you another advantage: you can run a search on very heterogeneous data on the same DM, e.g.: if you're searching for a failed login, you should run a search on many indexes with different contraints (e.g. 4625 in windows), instead you can run a search on a DM only using the correct one and you'll have the failed login for all the data you have.
Last information: identify the DMs that you need to use and accelerate them, but only the ones that you have to use to avoid to consume unuseful resources.
Ciao.
Giuseppe
One more thing because that's often overlooked when talking about DMs.
DMs as such don't accelerate anything. DMs are just an intermediate layer of logic making Splunk able to search different types of data in the same way so when you search from DM using DM fields constraints, Splunk "underneath" transforms your search into raw data search and lets you search possibly multiple separate indexes and sourcetypes without even knowing the real structure of the underlying data.
DM _acceleration_ however is a completely different beast. It's the machinery that's running under Splunk's hood and prepares this database of indexed datamodel contents so that you can search using those pre-built summaries instead of digging through the raw data itself.
So while the DMA requires properly ingested and configured data normalized for DMs, it's this "one step beyond" that gives you performance benefits. If you just have DMs which are not accelerated you might be able to search your data easier (and create pivots) but it will not give you any performance gains. It's the DM acceleration that makes Splunk go zzzzooooooom.
Thanks for the great explanation, really appreciated!
Here is link to CIM (Splunk Common Information Model) https://docs.splunk.com/Documentation/CIM/latest/User/Overview. By following it you can easily utilize create only once dashboard / report etc. and just add a new data sources and then those will be shown there.
Hi @AliMaher ,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated also by the others contributors 😉
Hi @AliMaher ,
in internet and on the YouTube Splunk Channel, you can find many videos or documents to describe what are Data Models and how and why use them, like the following:
https://www.youtube.com/watch?v=WBzKUYAfGsk
https://www.youtube.com/watch?v=n0HPe175k24
https://docs.splunk.com/Documentation/Splunk/9.2.1/Knowledge/Aboutdatamodels
and so on ...
Anyway summarizing, aData Models is a database containing structured and normalized (this is the password of the concept!) data that you can use, only for structured searches (there's no sense to put _raw on the DMs!) to have faster searches.
This means that you have always to choose add-ons CIM compliant, and if you have custom add-ons, you have to normalize them.
then you can run your searches having very faster results.
Then You can have still faster results using DM Acceleration.
When to use DMs?
you should use DMs all the times that you have to perform a search on structured data, in other words when you have to perform a search "field=value" on normalized data.
You cannot use DMs if you have to run a search on free text (as usual in Splunk) or on not normalized data; in this second case, my hint is to normalize your data and use DMs.
DMs give you another advantage: you can run a search on very heterogeneous data on the same DM, e.g.: if you're searching for a failed login, you should run a search on many indexes with different contraints (e.g. 4625 in windows), instead you can run a search on a DM only using the correct one and you'll have the failed login for all the data you have.
Last information: identify the DMs that you need to use and accelerate them, but only the ones that you have to use to avoid to consume unuseful resources.
Ciao.
Giuseppe