you have to analyze your data sources: some of them aren't correctly configurated.
The most common misconfigurations are the following:
the first job should be, using the following search, undertanding of what are the duplicated data source:
index="indexname" | stats values(sourcetype) AS sourcetype count by _raw | where count>1
in this way you have the list of duplicated sourcetypes and you can focus your analyis on those sourcetypes.
Then, if the duplicated sourcetypes come from a cluster or syslogs, you should analize your architecture to understand eventual duplications.
Other duplications can come from how your logs are managed at source, e.g. are they replicated across multiple servers, or are they renamed and picked up multiple times. Try these to find these sorts of occurrences.
index="indexname" | stats values(source) AS source count by _raw | where count>1
index="indexname" | stats values(host) AS host count by _raw | where count>1