How should I add '| dedup' as one of the constraints of the dat model ?
We have a data model having a sourcetype as a base constraint and other fields using which we generate statistical reports by tstats searches. This sourcetype has got duplicated data.
I would want to filter out the duplicated events so for accurate statistics reporting in the data model, so that the generated summaries are accurate. (some thing like '| dedup _raw' right before the 'stats' command in the usual searches)
I don't know about the precise question you asked - but I'd investigate why you have duplicate data in the first place. I know that won't help with historical information but it seems like the right answer here.
Is there information lacking in the logs making events appear duplicated? Are you grabbing a set of logs twice? Do two hosts both report the same information?
Well, I have found the root cause of the duplication and have resolved it too.
To sum up the question - the issue persisted for a month and for this duration we have duplication. We have reports being generated on this data every now and then by the users and the stats reported are not accurate due to dupes. These reports come from the accelerated summaries created by a data model.
Now, how can I not include the duplicated events in the data model summaries to have the stats accurate ?
I'm glad to hear you've got it straightened out now.
I think you have a couple of options. d and bwooden do a far better job of summarizing some of them in this answer, though I'd caution TEST TEST TEST before doing some of those! Remember, you shoudl be 100% the results of that search are really what you want to delete before you ever even enable the ability to USE delete. 😉
Anyway, If that's helpful please upvote that very thorough tag-teamed answer to give them some credit for it.
Your idea about including a dedup would probably work really well, except it'll be a huge performance impact all the time. Now, if perhaps you only need that for short while until that data expires out of the system, then maybe that's the easy way to go.