Knowledge Management

Data Model vs. Datasets - when to use?

simpkins1958
Contributor

Trying to understand the difference between Data Models and Datasets and when to use one vs. the other?

0 Karma
1 Solution

mattness
Splunk Employee
Splunk Employee

Hi Simpkins -

The answer to your question depends on the data you are working with and what you are trying to do. Data models are in fact collections of hierarchically arranged datasets--you might want to create a data model if you are working with a large dataset that can be divided into lots of very specifc subsets. A data model allows you to see the overall data model dataset hierarchy and then work with specifc elements (datasets) within that hierarchy. You can run searches on specifc data model datasets. You can also use the Pivot tool to build visualizations based on specific data model datasets.

Also, when you accelerate a data model, you can potentially accelerate all of the datasets within that data model (see the Knowledge Manager Manual docs for for information about data model acceleration restrictions). This means that searches and dashboards that use datasets in that data model can return results quicker than they would without acceleration.

To create a data model, you need to have a pretty solid understanding of your data and have a clear idea of how you'd like to subdivide it into smaller datasets. The Data Model Builder is not really designed for data exploration. It requires that you have a decent understanding of the search processing language (SPL).

On the other hand, you can also create table datasets with the Datasets Add-on. You might do this if you just want to work with a simple dataset that represents the results of a simple search. You can use the Table Editor to refine and focus the boundaries of that dataset without interacting with SPL. You can also use it to better understand the contents of a particular dataset. This might be a better solution if any of the following are true:

  • You are working with a relatively small dataset
  • You do not know your data very well
  • You do not know SPL well
  • You do not want to spend time designing complicated searches of a dataset
  • You do not need to design a collection of hierarchically-related datasets

You can accelerate table datasets in much the same way that you accelerate data models. You can also use Pivot to design visualizations based on specific table datasets.

You can also use the from command to create table datasets that "extend" other datasets. This creates a hierarchical relationship--a change to dataset can also affect any datasets that extend it--but currently the Splunk platform's ability to show you table dataset dependencies is pretty limited.

I think this sums it up. Hopefully this helped you more than it confused you. Let me know if you have more questions.

View solution in original post

mattness
Splunk Employee
Splunk Employee

Hi Simpkins -

The answer to your question depends on the data you are working with and what you are trying to do. Data models are in fact collections of hierarchically arranged datasets--you might want to create a data model if you are working with a large dataset that can be divided into lots of very specifc subsets. A data model allows you to see the overall data model dataset hierarchy and then work with specifc elements (datasets) within that hierarchy. You can run searches on specifc data model datasets. You can also use the Pivot tool to build visualizations based on specific data model datasets.

Also, when you accelerate a data model, you can potentially accelerate all of the datasets within that data model (see the Knowledge Manager Manual docs for for information about data model acceleration restrictions). This means that searches and dashboards that use datasets in that data model can return results quicker than they would without acceleration.

To create a data model, you need to have a pretty solid understanding of your data and have a clear idea of how you'd like to subdivide it into smaller datasets. The Data Model Builder is not really designed for data exploration. It requires that you have a decent understanding of the search processing language (SPL).

On the other hand, you can also create table datasets with the Datasets Add-on. You might do this if you just want to work with a simple dataset that represents the results of a simple search. You can use the Table Editor to refine and focus the boundaries of that dataset without interacting with SPL. You can also use it to better understand the contents of a particular dataset. This might be a better solution if any of the following are true:

  • You are working with a relatively small dataset
  • You do not know your data very well
  • You do not know SPL well
  • You do not want to spend time designing complicated searches of a dataset
  • You do not need to design a collection of hierarchically-related datasets

You can accelerate table datasets in much the same way that you accelerate data models. You can also use Pivot to design visualizations based on specific table datasets.

You can also use the from command to create table datasets that "extend" other datasets. This creates a hierarchical relationship--a change to dataset can also affect any datasets that extend it--but currently the Splunk platform's ability to show you table dataset dependencies is pretty limited.

I think this sums it up. Hopefully this helped you more than it confused you. Let me know if you have more questions.

ddrillic
Ultra Champion

About data models and data model datasets

says -

-- A data model is a type of knowledge object that applies an information structure to raw data, making it easier to use. Each data model represents a category of event data. Data models are composed of data model datasets. More specifically, a data model is a hierarchical search-time mapping of knowledge about one or more datasets. A data model encodes the domain knowledge necessary to build a variety of specialized searches of those datasets. Briefly put, data models generate searches. These specialized searches are in turn used to generate reports for Pivot users

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...