Getting Data In

"ghost" sourcetype during sourcetype override

vik_splunk
Communicator

We are using HEC to ingest logs from a cloud platform.

Environment details : HEC running on a windows instance of Splunk 7.0.3

The sourcetype A is sent in the event payload which is over-riding the sourcetype set in per token stanza.

In order to over-ride it to B, we use props.conf and transforms.conf as below.

Props.conf

[A]
TRANSFORMS-sourcetype = transformname

[B]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=json
KV_MODE=json
SHOULD_LINEMERGE=false
category=Structured
description=JavaScript Object Notation format. For more information, visit http://json.org/
disabled=false
pulldown_type=true
TIME_FORMAT=timeformat
LINE_BREAKER=([\r\n]+)
TIME_PREFIX=timeprefix

transforms.conf

[transformname]
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::B

This works fine in renaming sourcetype and timestamp assignment for B as expected.

What I cannot comprehend is when I search for raw events using index= .. I see equal count of events for A and B sourcetypes. Where it gets weirder is when I do stats count by sourcetype, I see count returns only for B.

Its as though A exists in the original raw data search but does not exist at the same time.

index= sourcetype=A does not return events. when I search index= sourcetype=B, both appears.

Can you please help on how I go about fixing this?

Tags (1)
0 Karma
1 Solution

FrankVl
Ultra Champion

Firstly: Applying indextime settings like timestamping and linebreaking on a sourcetype that is set using a TRANSFORMS does not work. You're probably seeing Splunk's automagic linebreaking and timestamping at work. You always need to set those configurations for the original sourcetype.

Secondly: since the sourcetype is included in the json data, that will get extracted again at searchtime, because you have KV_MODE=json. Not 100% sure why you get that inconsistent behavior (probably because of the TRANSFORMS that changes the indexed sourcetype value), but I would suggest changing that KV_MODE=json to KV_MODE=none. You already have the json fields extracted using INDEXED_EXTRACTIONS=json, extracting them again at searchtime using KV_MODE=json will lead to duplicate extractions incl. extracting the original sourcetype value from the json data.

View solution in original post

0 Karma

vik_splunk
Communicator

The issue seems to have been ephemeral nonetheless weird.

0 Karma

FrankVl
Ultra Champion

Firstly: Applying indextime settings like timestamping and linebreaking on a sourcetype that is set using a TRANSFORMS does not work. You're probably seeing Splunk's automagic linebreaking and timestamping at work. You always need to set those configurations for the original sourcetype.

Secondly: since the sourcetype is included in the json data, that will get extracted again at searchtime, because you have KV_MODE=json. Not 100% sure why you get that inconsistent behavior (probably because of the TRANSFORMS that changes the indexed sourcetype value), but I would suggest changing that KV_MODE=json to KV_MODE=none. You already have the json fields extracted using INDEXED_EXTRACTIONS=json, extracting them again at searchtime using KV_MODE=json will lead to duplicate extractions incl. extracting the original sourcetype value from the json data.

0 Karma

vik_splunk
Communicator

Hi @FrankVl thanks for the response.

I have previously attempted with KV_MODE=none as well to no avail. It still seems to exhibit this behaviour.

As for the timestamping, that was my understanding as well but it does work well with the stanza I posted earlier. Without the time prefix and format, the time is erroneous but with it, it works great.

I just confirm with some additional testing that this behaviour of displaying both A and B seems to be for real time searches. Historic searches work just fine.

0 Karma

amckinnie_splun
Splunk Employee
Splunk Employee

@vik_splunk did you ever find the answer as to why it showed on real-time searches(ad-hoc)? 

if you have the props set on your indexer, add the following to your SH

Set this on your Search Head:

props.conf

[mycustomsourcetype]
KV_MODE = none
AUTO_KV_JSON = false

ref: https://community.splunk.com/t5/Splunk-Search/Duplicate-Extracted-Fields-ingest-through-HEC/m-p/5033...

thanks @woodcock 
I had the settings on the indexer, just needed to add it to the SH

0 Karma

vik_splunk
Communicator

@amckinnie_splun  - It appeared the once post which we did not encounter the issue. If I recollect right, I did have to update a couple of stanzas on the SH post which it did not appear. Thanks for checking!

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...