Getting Data In

How do I prevent indexing duplicate data with CLONE_SOURCETYPE and_SYSLOG_ROUTING in transforms.conf?

blouder
Explorer

Hello,

I am experiencing unexpected behavior with the CLONE_SOURCETYPE attribute in transforms. When I use CLONE_SOURCETYPE, Splunk ends up indexing both copies of the cloned event. I am using "DEST_KEY = _SYSLOG_ROUTING" within the same stanza as CLONE_SOURCETYPE and it will route a copy of the cloned sourcetype to the syslog output processor but also indexes the cloned events.

Per the documentation for transforms detailing the values for "keys": "NOTE: Any KEY (field name) prefixed by '_' is not indexed by Splunk, in general." I am interpreting that as meaning that when using "DEST_KEY = _SYSLOG_ROUTING" the sourcetype should not be indexed.

Is this a bug in CLONE_SOURCETYPE?

Here is the config, I changed naming to protect privacy.

1. First I am applying the transforms named "clone_sourcetype" which makes a clone of all Windows Event Logs with a new sourcetype named "SIEM_FORMAT"

##PROPS##

[source::WinEventLog:*]
TRANSFORMS-WinEventLog = clone_sourcetype

##TRANSFORMS-1##

[clone_sourcetype]
REGEX = .
DEST_KEY = _SYSLOG_ROUTING
CLONE_SOURCETYPE = SIEM_FORMAT

2. Second I am taking the new sourcetype "SIEM_FORMAT" and am applying SEDCMD and LINEMERGE to merge the multiline Windows events into a single line event. I also apply another transform called "SIEM_syslog" which applies the output stanza "send_syslog_to_SIEM".

##PROPS##

[SIEM_FORMAT]
SEDCMD-rmlines=s/[\n\r\t]/ /g
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ((.+)\d+\/\d+\/\d+\s+\d+:\d+:\d+\s+([aApPmM]{2}))
TRANSFORMS-output = SIEM_syslog

##TRANSFORMS##

[SIEM_syslog]
REGEX = .
DEST_KEY = _SYSLOG_ROUTING
FORMAT = send_syslog_to_SIEM

##OUTPUTS##

[syslog:send_syslog_to_SIEM]
server = x.x.x.x:514
type = tcp
priority = NO_PRI

Everything works perfectly, I am seeing the events on the SIEM side formatted the way I need them except Splunk is indexing both the original and cloned sourcetype.

Any ideas as to why this is occuring?

rphillips_splk
Splunk Employee
Splunk Employee
0 Karma

jrodman
Splunk Employee
Splunk Employee

This occurs because you are using CLONE_SOURCETYPE, which generates a duplicate event, and you are not employing any system to prevent the duplicate from being indexed.

Sending an event to syslog doesn't prevent it from being indexed. That's why in the support case we ended up suggesting a short-term solution of inserting a forwarder to make the decision to not-forward via modification of _TCP_ROUTING.

For most users in this situation, simply asking for the original event to be sent to syslog will work fine without any CLONE_SOURCETYPE, because the syslog sendout will convert the original event to a single line format in the process of sending it out via that channel. However, your needs are less common because the accepting app doesn't like whatever format Splunk produces by default.

Another possible approach that I did not try, because it seemed to have drawbacks, was looking into the use of the _INDEX_AND_FORWARD_ROUTING key, while configuring the indexer to, by default, index no data that is forwarded (indexAndForward = false), running a transform over ALL data to set this key, and then removing the key only for your cloned events. It seemed unpalatable because it's fragile and confusing, and also because no one had the time to prove this approach viable in the response-time available.

blouder
Explorer

Hi jrodman, thank you for assisting with this. I tested the workaround you advised with _TCP_ROUTING on an Intermediate Forwarder and validated that it does work. So thank you for that!

I do want to clarify what you are saying in regards to "the syslog sendout will convert the original event to a single line format" as that is not what I am experiencing with multi-line Windows events. I just verified this again to make sure I didn't overlook something and when I send multi-line Windows events to _SYSLOG_ROUTING it remains as multi-line and is not converted to a single line. It does add the syslog header to the beginning of the event but it remains as multi-line. This is why I had to apply SEDCMD to remove new line (\n) and carriage return (\r). If _SYSLOG_ROUTING did format multi-line messages as defined by RFC 5424 for syslog, then there would be no need to use SEDCMD and subsequently CLONE_SOURCETYPE.

I am verifying what Splunk is sending out by using netcat to listen on TCP:514 and dumping the raw data received out to a text file.

0 Karma

jrodman
Splunk Employee
Splunk Employee

Instead of the not-supported hack [(?::){0}WinEventLog:*], you could just use the fully supported [source::WinEventLog...]

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...