Getting Data In

Splunk Enterprise differences and comparison

splunky_diamond
Path Finder

Hello splunkers!

I have a simple question regarding Splunk data models and regular searches, I have found some answers, but I would like to dig deeper. 

What's the advantage of using the data models? Like, why would we want to use the data models instead of regular searches where we just label the indexes in which we want to search for the data? 

I know so far that the data models allow searching through multiple sources (network devices and workstations) by having the standardized fields. I also know about the data accelaration, that we can use tstats in our searches on accelerated data models in order to speed up the searches.
Is there a particular scenario where we must use data models and not using them will not work?

(I am using Enterprise Security as well, so if there is any scenario that involves this app, it is most welcome)
I would really appreciate a well-detailed answer.

Thank you for taking time reading and replying to my post ❤️

Labels (2)
Tags (2)
0 Karma
1 Solution

deepakc
Builder

It’s mainly around performance, time to value, and using all the ES feature, you could be a large enterprise ingesting loads of data sources, and a big SOC operation, and you might want to run many many different correlations rules, this would not be practical on raw data, and it would take a long time to develop new rules when so many come out of the box. So, this is where DM's come into play, faster and better all round.  

 

For you it sounds like you have just a few use cases and can run your own rules on raw data, and if your happy with that, then that’s fine. But you’re not then exploiting what ES has to offer and all the use cases build around data models. 

View solution in original post

deepakc
Builder

Normal searches run on the Raw Data and Datamodels are a populated dataset based of the Raw data target fields, hence the Data models are faster.  

Data models normalize and standardise data across multiple indexes, allowing users to analyse data consistently regardless of the source. They include accelerations and summaries, such as data summaries, these accelerations speed up searches and make analysis faster and more efficient, especially for large datasets.

Overall, using data models in Splunk enhances data analysis capabilities, improves performance, and simplifies the process of exploring and understanding data.

ES is based on Datamodels, so you index you security data sources like Firewall/IDS,Network/Auth in the normal way, you then ensure you install the CIM (Common Information Model) Compliant TA's for those data sources, and after you tune the Datamodels for searching your target indexes and it will search and populate the Datamodels based on the types of data sources.

Once all in and configured ES lights up, and you can deploy various Correlation Rules which mostly run on the datamodels. (Simple explanation) 

Example:

You want to ensure your Network Firewall Traffic Data Model is available for data for ES, you then ingest Cisco ASA data into your normal index, you then ensure you install the Cisco ASA TA from Splunkbase, you then tune CIM for this data so it searches it and populates Network_Traffic data model on a regular basis and from there you can run various Rules or create your own, using tstats etc.    

See the below for the various data models and various normalised fields

https://docs.splunk.com/Documentation/CIM/5.0.2/User/Howtousethesereferencetables

PickleRick
SplunkTrust
SplunkTrust

Ok, let me add my three cents here.

There are two separate things that get confused often - datamodels themselves and datamodel acceleration.

Datamodel on its own is just a layer of abstraction - it defines field and constraints so that you can (if your data is properly onboarded!) search without getting into gory technical details of each specific source and sourcetype. So instead of digging through all your separate firewall types across your enterprise, you can just search from the Network Traffic datamodel and look for particular src_ip.

That makes your life easier and makes maintenance of your searches easier. It also makes it possible to build generalized searches and use cases for ES (of course DM usage is not limited to ES but it's most obvious there) which contain the required logic but are independent of the technical details.

It makes an admin's life harder though because you need to make sure your data is properly normalized to applicable datamodels during onboarding. But this is a job you do once and use it forever after.

Performance-wise however DM on its own is completely neutral - if you're not using acceleration the search from datamodel is silently translated into a "normal" search (you can see that in job log).

Datamodel acceleration is another thing though. Since datamodels define a predefined set of fields the data can be indexed similarily to indexed fields. So Splunk spawns a search using a schedule defined for acceleration summary building. Then you can use such summary with tstats. And that gives a huge performance boost like any other use of tstats (as long as you're using summariesonly=t and the search doesn't "leak out" onto raw data).

The downside of course is that you have those summary-building searches running every 5, 10 or 15 minutes (and they can eat up quite a lot of resources) but the upside is that searches using those accelerated summaries are lightning-fast compared to normal event searches.

splunky_diamond
Path Finder

Hello @deepakc , thanks for your post.

As I mentioned in my post, I knew about the data acceleration and ability to run the searches across multiple sources. Undoubtedly, these are the main advantages of using data models. However, regarding the usage of data models in Splunk ES, I have a custom correlation search that is running without the usage of data models, and it works perfectly fine, which leaves the question about the need of usage of data models in correlation searches in ES still open.


 

0 Karma

deepakc
Builder

It’s mainly around performance, time to value, and using all the ES feature, you could be a large enterprise ingesting loads of data sources, and a big SOC operation, and you might want to run many many different correlations rules, this would not be practical on raw data, and it would take a long time to develop new rules when so many come out of the box. So, this is where DM's come into play, faster and better all round.  

 

For you it sounds like you have just a few use cases and can run your own rules on raw data, and if your happy with that, then that’s fine. But you’re not then exploiting what ES has to offer and all the use cases build around data models. 

splunky_diamond
Path Finder

Thank you very much! That pretty much explains everything!

 

0 Karma

deepakc
Builder

@splunky_diamond  your welcome 

Here's 's some more security tips to help you discovery some more.

1. Many Security people use this app to help them with there Security Use cases, I use it myself - so many good features, it can also make use case recommendations
based on on your data sources.
https://splunkbase.splunk.com/app/3435 


2. ESCU -
Provides regular Security Content updates to help security SOC / analysts to address ongoing time-sensitive threats, attack methods, and other security issues.
https://splunkbase.splunk.com/app/3449 


3. Here you will find so many use cases for reference - great place to baseline your security monitoring strategy.
https://research.splunk.com/ 

Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...