Getting Data In

How do I prevent indexing duplicate data with CLONE_SOURCETYPE and_SYSLOG_ROUTING in transforms.conf?

blouder
Explorer

Hello,

I am experiencing unexpected behavior with the CLONE_SOURCETYPE attribute in transforms. When I use CLONE_SOURCETYPE, Splunk ends up indexing both copies of the cloned event. I am using "DEST_KEY = _SYSLOG_ROUTING" within the same stanza as CLONE_SOURCETYPE and it will route a copy of the cloned sourcetype to the syslog output processor but also indexes the cloned events.

Per the documentation for transforms detailing the values for "keys": "NOTE: Any KEY (field name) prefixed by '_' is not indexed by Splunk, in general." I am interpreting that as meaning that when using "DEST_KEY = _SYSLOG_ROUTING" the sourcetype should not be indexed.

Is this a bug in CLONE_SOURCETYPE?

Here is the config, I changed naming to protect privacy.

1. First I am applying the transforms named "clone_sourcetype" which makes a clone of all Windows Event Logs with a new sourcetype named "SIEM_FORMAT"

##PROPS##

[source::WinEventLog:*]
TRANSFORMS-WinEventLog = clone_sourcetype

##TRANSFORMS-1##

[clone_sourcetype]
REGEX = .
DEST_KEY = _SYSLOG_ROUTING
CLONE_SOURCETYPE = SIEM_FORMAT

2. Second I am taking the new sourcetype "SIEM_FORMAT" and am applying SEDCMD and LINEMERGE to merge the multiline Windows events into a single line event. I also apply another transform called "SIEM_syslog" which applies the output stanza "send_syslog_to_SIEM".

##PROPS##

[SIEM_FORMAT]
SEDCMD-rmlines=s/[\n\r\t]/ /g
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ((.+)\d+\/\d+\/\d+\s+\d+:\d+:\d+\s+([aApPmM]{2}))
TRANSFORMS-output = SIEM_syslog

##TRANSFORMS##

[SIEM_syslog]
REGEX = .
DEST_KEY = _SYSLOG_ROUTING
FORMAT = send_syslog_to_SIEM

##OUTPUTS##

[syslog:send_syslog_to_SIEM]
server = x.x.x.x:514
type = tcp
priority = NO_PRI

Everything works perfectly, I am seeing the events on the SIEM side formatted the way I need them except Splunk is indexing both the original and cloned sourcetype.

Any ideas as to why this is occuring?

rphillips_splk
Splunk Employee
Splunk Employee
0 Karma

jrodman
Splunk Employee
Splunk Employee

This occurs because you are using CLONE_SOURCETYPE, which generates a duplicate event, and you are not employing any system to prevent the duplicate from being indexed.

Sending an event to syslog doesn't prevent it from being indexed. That's why in the support case we ended up suggesting a short-term solution of inserting a forwarder to make the decision to not-forward via modification of _TCP_ROUTING.

For most users in this situation, simply asking for the original event to be sent to syslog will work fine without any CLONE_SOURCETYPE, because the syslog sendout will convert the original event to a single line format in the process of sending it out via that channel. However, your needs are less common because the accepting app doesn't like whatever format Splunk produces by default.

Another possible approach that I did not try, because it seemed to have drawbacks, was looking into the use of the _INDEX_AND_FORWARD_ROUTING key, while configuring the indexer to, by default, index no data that is forwarded (indexAndForward = false), running a transform over ALL data to set this key, and then removing the key only for your cloned events. It seemed unpalatable because it's fragile and confusing, and also because no one had the time to prove this approach viable in the response-time available.

blouder
Explorer

Hi jrodman, thank you for assisting with this. I tested the workaround you advised with _TCP_ROUTING on an Intermediate Forwarder and validated that it does work. So thank you for that!

I do want to clarify what you are saying in regards to "the syslog sendout will convert the original event to a single line format" as that is not what I am experiencing with multi-line Windows events. I just verified this again to make sure I didn't overlook something and when I send multi-line Windows events to _SYSLOG_ROUTING it remains as multi-line and is not converted to a single line. It does add the syslog header to the beginning of the event but it remains as multi-line. This is why I had to apply SEDCMD to remove new line (\n) and carriage return (\r). If _SYSLOG_ROUTING did format multi-line messages as defined by RFC 5424 for syslog, then there would be no need to use SEDCMD and subsequently CLONE_SOURCETYPE.

I am verifying what Splunk is sending out by using netcat to listen on TCP:514 and dumping the raw data received out to a text file.

0 Karma

jrodman
Splunk Employee
Splunk Employee

Instead of the not-supported hack [(?::){0}WinEventLog:*], you could just use the fully supported [source::WinEventLog...]

Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...