Getting Data In

Dataset Constraint

RdomSplunkUser7
Explorer

In the documentation <https://help.splunk.com/en/splunk-enterprise/manage-knowledge-objects/knowledge-management-manual/9....>, there is written:

Dataset constraints determine the first part of the search through

  • Simple search filters (Root event datasets and all child datasets).
  • Complex search strings (Root search datasets).
  • transaction definitions (Root transaction datasets).

In my new data model I try to make a new dataset constraint which will try to select only unique field  eventId.

EventId is a number, ie.123456.

My goal is to drop duplicated log lines. Is it possible to define this kind of data set constraint?

Labels (1)
0 Karma
1 Solution

PrewinThomas
Motivator

@RdomSplunkUser7 

You may try to use a "Root search dataset."

When you create your data model, instead of starting with a "Root Event" dataset , choose to create a "Root Search" dataset.
In the "Search String" field for this Root Search dataset, put your base search query followed by the dedup command
Eg:
index=test_logs sourcetype="test_logs_st" [your base filters] | dedup eventId

This might be able to built datamodel only from events with unique eventId

Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!

View solution in original post

PrewinThomas
Motivator

@RdomSplunkUser7 

You may try to use a "Root search dataset."

When you create your data model, instead of starting with a "Root Event" dataset , choose to create a "Root Search" dataset.
In the "Search String" field for this Root Search dataset, put your base search query followed by the dedup command
Eg:
index=test_logs sourcetype="test_logs_st" [your base filters] | dedup eventId

This might be able to built datamodel only from events with unique eventId

Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!

marnall
Motivator

I don't think it is possible to constrain a dataset to "only intake 1 event containing each value of EventId and then exclude the rest of the events with the same EventId value." This would require the dataset to check against a list of already-included EventId values for every new event it intakes.

It would be better to do this in another way. Ideally you could change the events themselves so that they only have one event per EventID, but there are other tricks you could try, like making a search that makes summary-indexed events once per EventID while excluding all EventIDs that already exist in the destination index. Then you could set the datamodel+dataset to include events from the index of summary-indexed events.

Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...