Getting Data In

Json source type produces duplicated data

Flobzh
Engager

Hello,

I'm sending JSon data to the Http Event collector. When I exectute searches, all the non-metadata fields have duplicated values: 

Flobzh_1-1689350077305.png

Which causes tons of issues when doing sum, count... 

On my Splunk Cloud instance, I setup my source type this way, playing with KV_MODE, INDEXED_EXTRATIONS and AUTO_KV_JSON settings, but with no success... 

Flobzh_0-1689349778916.png

Let me know what could be wrong?

Thanks for your help.

0 Karma

mmccul_slac
Engager

You'd need to use btool to check at the OS level for any configs for that source and sourcetype, e.g., 

splunk btool props list RanorexJSon
splunk btool props list source::ElectraExtendedUI

(Make sure to get the sourcetype and source names accurate).  You're looking for parameters about indexed extractions.  Since a props can apply to both a sourcetype and a source (as well as host, but that's less likely), search for both.

0 Karma

mmccul_slac
Engager

This problem indicates you have indexed field extraction enabled on your JSON events and are at the same time doing search time extraction of the JSON.

I typically recommend disabling indexed field extraction and do not rely on the built-in _json sourcetype, but instead use a more descriptive sourcetype that identifies the expected fields of the JSON, e.g. "myapp:json" which allows you to select it more readily for targeted additional processing or extraction.

0 Karma

Flobzh
Engager

Hello and thanks @mmccul_slac. I tried your option but didn't succeed...  and doing a descriptive source type is kind of a hassle to do, especially for a well formatted json and when other of my sources are properly working 😉

I tried a few more things to see why this json was behaving differently than other Source Types but no luck. I ended up scrapping my "faulty" Source Type and, out of idea, linked another json Source Type to my http event collector. It worked, no duplicated values !!!

Flobzh_2-1689602757980.png

 

I then cloned this working Source Type, renamed it and replaced the cloned one as Source Type in my event collector: 

Flobzh_0-1689602035580.png

-> I'm back with my duplicated messages ?!?

Flobzh_1-1689602618161.png

 

The only differences at this point, are the name of the Source Type and when it's been created... 

Even though I'm not blocked anymore, I would like to be able to have a dedicated Source Type and need a proper explanation and solution of what is happening... At this point I would really like this to be a bug, so at least it explain the non-consistency of the behavior.

Thanks

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

As @mmccul_slac says, Indexed Extractions=true is what causes this behaviour. When JSON data comes in, if it's set to true, Splunk will parse and index the JSON data and when you search, Splunk will also parse and create fields from the JSON at search time, hence you get duplicates.

See this 

https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-w...

and it may depend on where the data is coming from to HEC and whether it's coming from an intermediate Splunk Universal forwarder

0 Karma
Get Updates on the Splunk Community!

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...