Getting Data In

Why are JSON events coming in as duplicates after using the spath command?

haoban
Path Finder

I'm using a bash script to call Cisco ESA API and I get the following JSON events.

 sourcetype="cisco:esa:api:by:hour" uri="/api/v1.0/stats/mail_incoming_traffic_summary?1h" | spath | rename data.blocked_dmarc AS Stopped_by_DMARC, data.blocked_invalid_recipient AS Stopped_as_Invalid_Recipients, data.blocked_reputation AS Stopped_by_Reputation_Filtering, data.bulk_mail AS Bulk_Messages, data.detected_amp AS Detected_by_Advanced_Malware_Protection, data.detected_spam AS Spam_Detected, data.detected_virus AS Virus_Detected, data.ims_spam_increment_over_case AS Additional_Spam_Detected_by_Intelligent_Multi-Scan, data.malicious_url AS Messages_with_Malicious_URLs, data.marketing_mail AS Marketing_Messages, data.social_mail AS Social_Networking_Messages, data.threat_content_filter AS Stopped_by_Content_Filter, data.total_clean_recipients AS Clean_Messages, data.total_graymail_recipients AS Total_Graymails, data.total_recipients AS Total_Attempted_Messages, data.total_threat_recipients AS Total_Threat_Messages, data.verif_decrypt_fail AS S-MIME_Verification-Decryption_Failed, data.verif_decrypt_success AS S-MIME_Verification-Decryption_Successful | table _time, Stopped_by_DMARC, Stopped_as_Invalid_Recipients, Stopped_by_Reputation_Filtering, Bulk_Messages, Detected_by_Advanced_Malware_Protection, Spam_Detected, Virus_Detected, Additional_Spam_Detected_by_Intelligent_Multi-Scan, Messages_with_Malicious_URLs, Marketing_Messages, Social_Networking_Messages, Stopped_by_Content_Filter, Clean_Messages, Total_Graymails, Total_Attempted_Messages, Total_Threat_Messages, S-MIME_Verification-Decryption_Failed, S-MIME_Verification-Decryption_Successful

when I use "spath" and "table" to convert it to table always have the duplicate events as following
alt text

I referred to another answer and modified props.conf as follows:
[source::...ta-cisco-esa-api*.log*]
SHOULD_LINEMERGE = true
sourcetype = ta:cisco:esa:api:log

[source::...ta_cisco_esa_api*.log*]
SHOULD_LINEMERGE = true
sourcetype = ta:cisco:esa:api:log

[cisco:esa:api]
TRANSFORMS-send-data-to-index-queue = setparsing
category = Splunk App Add-on Builder
pulldown_type = 1
DATETIME_CONFIG =
NO_BINARY_CHECK = true
disabled = false
KV_MODE = none
AUTO_KV_JSON = false
INDEXED_EXTRACTIONS = json

[cisco:esa:api:by:hour]
SHOULD_LINEMERGE = true
category = Splunk App Add-on Builder
pulldown_type = 1
DATETIME_CONFIG =
NO_BINARY_CHECK = true
TRANSFORMS-send-data-to-index-queue = setparsing
disabled = false
KV_MODE = none
AUTO_KV_JSON = false
INDEXED_EXTRACTIONS = json

If I removed "KV_MODE = none, AUTO_KV_JSON = false, INDEXED_EXTRACTIONS = json" the search results would be the same three records.

How can I have a unique event? Thanks!

Tags (3)
0 Karma
1 Solution

maciep
Champion

what if your remove the spath command from your search? It seems to me like you already have those fields being extracted in some other way, so why do it again with spath?

Also, it's important to understand what those settings to as well.

KV_MODE
happens at search time. Can be set to a handful of values. When set to none, splunk will do not perform search time field extractions on your behalf. It defaults to auto, which will extract = pairs

AUTO_KV_JSON
happens at search time . It will try to automatically extract json fields from events. Defaults to true

INDEXED_EXTRACTIONS
configured at input time. Will create indexed fields - meaning, these fields are indexed with the data, not created at search time. Modifying this setting will have no impact on data that has already been ingested into Splunk.

My guess is that those fields are being indexed, so they will already exist with your data. And then you use spath in your search, which extracts them again. But it's hard to know for sure w/o knowing what your splunk environment looks like, how you're ingesting the data, etc.

And in general, it's probably a good idea to understand the phases of data in Splunk. Even in a one-server environment, knowing which settings apply to which phase and what that means will be extremely helpful

View solution in original post

haoban
Path Finder

Thanks Maciep! Remove the "spath" command can resolve this issue.

0 Karma

maciep
Champion

what if your remove the spath command from your search? It seems to me like you already have those fields being extracted in some other way, so why do it again with spath?

Also, it's important to understand what those settings to as well.

KV_MODE
happens at search time. Can be set to a handful of values. When set to none, splunk will do not perform search time field extractions on your behalf. It defaults to auto, which will extract = pairs

AUTO_KV_JSON
happens at search time . It will try to automatically extract json fields from events. Defaults to true

INDEXED_EXTRACTIONS
configured at input time. Will create indexed fields - meaning, these fields are indexed with the data, not created at search time. Modifying this setting will have no impact on data that has already been ingested into Splunk.

My guess is that those fields are being indexed, so they will already exist with your data. And then you use spath in your search, which extracts them again. But it's hard to know for sure w/o knowing what your splunk environment looks like, how you're ingesting the data, etc.

And in general, it's probably a good idea to understand the phases of data in Splunk. Even in a one-server environment, knowing which settings apply to which phase and what that means will be extremely helpful

fabry
New Member

I am having the same issue. 

For me, if I don't add 'spath', not all the data are extracted, while if I add spath I see all the data but the data that was already visible without spath are doubled.

This is the query:

index="myindex" sourcetype="mysource-*prod" 
| spath 
| search service.name="*" route.name="*"  response.status="*"
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...