Getting duplicate data

mike_randall · ‎07-30-2019

I have set the input to run every hour and I am getting duplicate data. I tried to make sense of the Odata variables to request only the past 1 hour of data (to avoid duplicates) but didn't have any success. How do I set up the Odata to only request a time period like -minus 1hour, in order to smooth the data and not index duplicate data?

_joe · ‎04-20-2021

I wanted to provide an answer more specific to the "Microsoft Graph Security API Add-On for Splunk."

I just installed this app (Version 1.2.1) and immediately hit this issue in my distributed environment (SHC, IDXC, HFs). It seems the problem is that the app includes:

INDEXED_EXTRACTIONS = json
KV_MODE = json

Since it is recommended to be installed on both the search head and the heavy forwarder, this means the fields are indexed on the heavy forwarder, then again extracted on the SH which results in two values.

I guess you have to options, turn off KVmode on your search head or turn off Index-time extractions. Personally I did the latter by adding this to my heavy forwarder:

TA-microsoft-graph-security-add-on-for-splunk/local/props.conf
[GraphSecurityAlert]
INDEXED_EXTRACTIONS =

jaxjohnny2000 · ‎09-13-2019

run this to be sure:

| rename _raw as raw
| eval raw_bytes=len(raw)
| transaction raw maxspan=1s keepevicted=true
| search eventcount>1
| eval extra_events=eventcount-1
| eval extra_bytes=extra_events*raw_bytes
| stats sum(extra_events) as extra_events, sum(eval(extra_bytes/1024.0/1024.0)) as extra_mb values(source) by source
| rename "values(source)" as "Duplicated in"

from: https://answers.splunk.com/answers/432/how-do-i-find-all-duplicate-events.html

flle · ‎10-18-2019

mike.randal, are you sure you are getting dublicate events (dublicate JSON events) or are you just seeing dublicate entries in the fields if you output events with | table ...
In the second case, this is likely due to the Add-On doing index-time (via indexed_extractions = JSON) and search-time field extractions (via KV_MODE = JSON) resulting in dublicate field entries, if you have the addon installed on Heavy Forwarder/Indexer and Search Head. See: https://answers.splunk.com/answers/223095/why-is-my-sourcetype-configuration-for-json-events.html

You could circumvent this by using spath on the fields you want to display and output to new fields. Or you adopt the props.conf settings of the Add-On (which might have other implications).

jwalzerpitt · ‎10-24-2019

Was having the same issue and then implemented the changes in the linl ( https://answers.splunk.com/answers/223095/why-is-my-sourcetype-configuration-for-json-events.html) @flle graciously provided and the duplicate field entries went away.

Thx @flle

Getting duplicate data

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation

Getting duplicate data

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future