Hello!
I maintain Splunk reports. Some of the Pivot reports are based on a Dataset that is generated based on a simple search. Duplicate values have not been taken into account in the generation.
Due to an error, there were two data sources for a few weeks. This resulted in identical duplicate rows in the dataset.
In the future, duplicate rows can be removed from the dataset with a simple dedup. However, are there any best practices to fix this?
Maybe I can make a new data set which is the original data set minus duplicate identical log lines. Are there any tutorials for this? I am newbie super user for just for some reports. I hate this role.
I dont like the idea you can not add dedup with pipelines in the simple base search of dataset.
The Splunk should offer a ready method to deduplicate index.
My situation is quite simple. I lack the basic training. The ChatGPT showed me the way. Maybe this can be an answer for another Splunk report newbie super user.
”Ah, so you want to deduplicate data and use it in a Pivot table – great clarification! Pivot in Splunk is based on Data Models, and indeed, there are limitations on SPL commands (like | dedup _raw) in that context.
🔍 The Problem:
Pivot uses a Data Model, and in the base search of a Data Model, you cannot use pipe (|) commands like dedup.
🎯 Your Goal:
Remove duplicates based on _raw and still use the data in a Pivot table.
✅ Solution Options for Use in Pivot:
🔁
1. Create a Saved Search with dedup, then build a Data Model on top of it
This is the recommended method:
Step 1: Create a Saved Search
index=your_index sourcetype=your_sourcetype
| dedup _raw
Step 2: Create a new Data Model based on that Saved Search
savedsearch="Deduped Raw Events"
Step 3: Use Pivot on top of this Data Model
⚠️ Notes:
🧪 Option 2: Simulate Dedup within the Data Model (if possible)
Data Models do not allow | dedup, but you can:
📌 Summary:
Method | Dedup Allowed? | Usable in Pivot? |
Saved Search + dedup | ✅ | ✅ |
Native Data Model search | ❌ | ✅ |
SPL with pipes in Pivot UI | ❌ (not allowed) | ✅ but very limited |
If you’d like, I can also help you write the full search or configure it for a specific type of data or log source – just let me know what you’re using it for in Pivot!”
The data is very simple event log type data. The amount of data is small. There is a unique field in log lines (event id). The question was about how to tweak existing data set. Splunk is not good for these type of business reports which should be moved to another report platform (ie MSBI).
I have an identical situation as described here <https://community.splunk.com/t5/Reporting/How-to-not-include-the-duplicated-events-while-acceleratin...>
Unfortunately at least I didn't know any generic answer for this.
That method what they presented here is one option, but as said you need to be 100% sure that it works with your data and test it several times to be sure!
And of course you must 1st get rid of those new duplicates and ensure that all your inputs works as they should without duplicating new events.
After that you probably could do that delete if you are absolutely sure that it works also in your case. And I propose that you should use some temp account which has can_delete role just for this time what is needed to do that clean up.
I think ultimately this depends on what your searches are doing, if there is a risk of pulling in duplicate data then dedup is a good option, or you could look at using something like stats latest(fieldName) as latestFieldName
It really depends on your search(es). If you'd like to share the SPL we might be able to help further.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
It is based on very simple search.
index=<index_name> sourcetype= <blaahaa> field2. After this, a number of fields are extracted using rex.
I would like to include in the search as a new contrain a very simple dedup clause "| dedup _raw|".
Is this advisable?
Here are some guidance how to resolve the problem