Splunk Enterprise

Dataset best practices

RdomSplunkUser7
Explorer

Hello!

I maintain Splunk reports. Some of the Pivot reports are based on a Dataset that is generated based on a simple search. Duplicate values ​​have not been taken into account in the generation.

Due to an error, there were two data sources for a few weeks. This resulted in identical duplicate rows in the dataset.

In the future, duplicate rows can be removed from the dataset with a simple dedup. However, are there any best practices to fix this?

0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust
Of course you can add dedup on those queries but this will kill your performance!
And it depends how this duplication has happened and how you could identified those events? That just depends on how that has happened, can you e.g. dedup _raw or just set of different fields or did it needs some calculations/modifications (e.g. times) too?

View solution in original post

RdomSplunkUser7
Explorer

Maybe I can make a new data set which is the original data set minus duplicate identical log lines. Are there any tutorials for this? I am newbie super user for just for some reports. I hate this role.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
How big and complex your dataset is and how much its content is changing? And how long time span it covers?

RdomSplunkUser7
Explorer

I dont like the idea you can not add dedup with pipelines in the simple base search of dataset. 

The Splunk should offer a ready method to deduplicate index.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Of course you can add dedup on those queries but this will kill your performance!
And it depends how this duplication has happened and how you could identified those events? That just depends on how that has happened, can you e.g. dedup _raw or just set of different fields or did it needs some calculations/modifications (e.g. times) too?

RdomSplunkUser7
Explorer

My situation is quite simple. I lack the basic training. The ChatGPT showed me the way. Maybe this can be an answer for another Splunk report newbie super user.

”Ah, so you want to deduplicate data and use it in a Pivot table – great clarification! Pivot in Splunk is based on Data Models, and indeed, there are limitations on SPL commands (like | dedup _raw) in that context.

 

 

🔍 The Problem:

 

 

Pivot uses a Data Model, and in the base search of a Data Model, you cannot use pipe (|) commands like dedup.

 

 

🎯 Your Goal:

 

 

Remove duplicates based on _raw and still use the data in a Pivot table.

 

 

 

 

Solution Options for Use in Pivot:

 

 

 

🔁 

1. Create a Saved Search with dedup, then build a Data Model on top of it

 

 

This is the recommended method:

 

 

Step 1: Create a Saved Search

 

 

  1. Go to Splunk’s Search view.
  2. Enter the SPL:

 

index=your_index sourcetype=your_sourcetype

| dedup _raw

 

  1.  
  2. Click Save As → Report.
  3. Name it, for example: Deduped Raw Events.

 

 

 

Step 2: Create a new Data Model based on that Saved Search

 

 

  1. Go to Settings → Data Models → New Data Model.
  2. Give it a name and save it.
  3. Add a new Object, and choose:
    • Object Type: Event
    • Constraint: Use the saved search:
  4.  

 

savedsearch="Deduped Raw Events"

 

  1.  
    •  
    • NOTE: savedsearch="your_report_name" references the saved search.
    •  

 

 

 

Step 3: Use Pivot on top of this Data Model

 

 

  1. Go to Pivot → Select your new Data Model → Deduped Raw Events.
  2. Build your table as desired.

 

 

 

 

 

⚠️ Notes:

 

 

  • This only works if the saved search is public (shared) or you have permission to use it.
  • The Saved Search must return fields that you can use in Pivot (like _time, host, source, custom fields, etc.).

 

 

 

 

 

🧪 Option 2: Simulate Dedup within the Data Model (if possible)

 

 

Data Models do not allow | dedup, but you can:

 

  • Add an auto-extracted field, which lets you group by that field in Pivot.
  • Or, if you have a unique identifier (e.g., event_id), you can use first-value or latest-value aggregations in Pivot to simulate deduplication.

 

 

 

 

 

📌 Summary:

 

Method

Dedup Allowed?

Usable in Pivot?

Saved Search + dedup

Native Data Model search

SPL with pipes in Pivot UI

(not allowed)

but very limited

 

 

 

If you’d like, I can also help you write the full search or configure it for a specific type of data or log source – just let me know what you’re using it for in Pivot!”

 

The data is very simple event log type data. The amount of data is small. There is a unique field in log lines (event id). The question was about how to tweak existing data set. Splunk is not good for these type of business reports which should be moved to another report platform (ie MSBI).

0 Karma

RdomSplunkUser7
Explorer
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Unfortunately at least I didn't know any generic answer for this.

That method what they presented here is one option, but as said you need to be 100% sure that it works with your data and test it several times to be sure!

And of course you must 1st get rid of those new duplicates and ensure that all your inputs works as they should without duplicating new events.

After that you probably could do that delete if you are absolutely sure that it works also in your case. And I propose that you should use some temp account which has can_delete role just for this time what is needed to do that clean up.

livehybrid
SplunkTrust
SplunkTrust

Hi @RdomSplunkUser7 

I think ultimately this depends on what your searches are doing, if there is a risk of pulling in duplicate data then dedup is a good option, or you could look at using something like stats latest(fieldName) as latestFieldName

It really depends on your search(es). If you'd like to share the SPL we might be able to help further.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

RdomSplunkUser7
Explorer

It is based on very simple search.

index=<index_name> sourcetype= <blaahaa>  field2. After this, a number of fields are extracted using rex.

I would like to include in the search as a new contrain a  very simple dedup clause "| dedup _raw|". 

Is this advisable?

0 Karma

RdomSplunkUser7
Explorer
0 Karma

isoutamo
SplunkTrust
SplunkTrust
IMHO: I don't like or suggest you to add delete permissions to anyone permanently! It isn't great idea to run scheduled job which are removing events from splunk.
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...