Getting Data In

How to split an event at index time?

DUThibault
Contributor

In a nutshell, I need CLONE_SOURCETYPE functionality within a single sourcetype. I have events (from a [source::] stanza) that come in the form <timestamp>,<value1>,<value2> on a Universal Forwarder and I need it to split this into two events <timestamp>,<value1> and <timestamp>,<value2> which will be both sent to the same sourcetype.

How can this be done?

I am speculating that I could CLONE_SOURCETYPE and have the new sourcetype apply a couple of TRANSFORMS to change the (duplicated) _raw and its MetaData:Sourcetype (back to the original destination sourcetype). Is there a simpler way?

0 Karma
1 Solution

DUThibault
Contributor

To recap, the problem is that we have a source whose events need to be split and end up in a certain target format. In this particular case, this is done on a universal forwarder, but the solution applies to a source local to a Splunk indexer too.

1) In inputs.conf, identify the sourcetype as intermediate_sourcetype_1.
2) In props.conf (with force_local_processing = true), assign to intermediate_sourcetype_1 any common _raw transformation to a SEDCMD, then specify three TRANSFORMS which we'll call transform-clone-1, transform-clone-2, transform-drop.
3) In transforms.conf, define transform-drop as REGEX = .*, DEST_KEY = queue, FORMAT = nullQueue. This transform simply drops the event.
4) Still in transforms.conf, define transform-clone-1 as REGEX = .*, DEST_KEY = _raw, FORMAT = $0, CLONE_SOURCETYPE = intermediate_sourcetype_2A. transform-clone-2 is the same except that CLONE_SOURCETYPE = intermediate_sourcetype_2B.
5) Back in props.conf, assign appropriate SEDCMD and TRANSFORMS to intermediate_sourcetype_2A and intermediate_sourcetype_2B. Make sure to conclude the TRANSFORMS set with transform-switch-sourcetype.
6) Finally, back in transforms.conf, define transform-switch-sourcetype as SOURCE_KEY = MetaData:Sourcetype, REGEX = .*, DEST_KEY = MetaData:Sourcetype, FORMAT = sourcetype::target_sourcetype. This transform simply switches the event's sourcetype.

It's important that props.conf not rely on source:: stanzas to process the events, because that stanza would apply to the cloned events as well as to the original event, resulting in multiple applications of SEDCMD and TRANSFORMS. That is unlikely to yield the desired results except in really odd circumstances.

A note of caution: keep backups of inputs.conf, props.conf and transforms.conf appearing in the universal forwarder's /opt/splunkforwarder/etc/apps/_server_app_<server_class>/local because Splunk Web will wipe them when you change the input configuration—if you edit directly in that folder. The workaround is to build your inputs.conf, props.conf, and transforms.conf in /opt/splunk/etc/deployment-apps/_server_app_<server_class>/local on the main Splunk instance. The only caveat I've had with this is that there is seemingly no way to "refresh" the forwarder from Splunk Web (you'd expect that in Settings: (Distributed environment) Forwarder management); you must use the command line and issue splunk reload deploy-server.

View solution in original post

0 Karma

DUThibault
Contributor

To recap, the problem is that we have a source whose events need to be split and end up in a certain target format. In this particular case, this is done on a universal forwarder, but the solution applies to a source local to a Splunk indexer too.

1) In inputs.conf, identify the sourcetype as intermediate_sourcetype_1.
2) In props.conf (with force_local_processing = true), assign to intermediate_sourcetype_1 any common _raw transformation to a SEDCMD, then specify three TRANSFORMS which we'll call transform-clone-1, transform-clone-2, transform-drop.
3) In transforms.conf, define transform-drop as REGEX = .*, DEST_KEY = queue, FORMAT = nullQueue. This transform simply drops the event.
4) Still in transforms.conf, define transform-clone-1 as REGEX = .*, DEST_KEY = _raw, FORMAT = $0, CLONE_SOURCETYPE = intermediate_sourcetype_2A. transform-clone-2 is the same except that CLONE_SOURCETYPE = intermediate_sourcetype_2B.
5) Back in props.conf, assign appropriate SEDCMD and TRANSFORMS to intermediate_sourcetype_2A and intermediate_sourcetype_2B. Make sure to conclude the TRANSFORMS set with transform-switch-sourcetype.
6) Finally, back in transforms.conf, define transform-switch-sourcetype as SOURCE_KEY = MetaData:Sourcetype, REGEX = .*, DEST_KEY = MetaData:Sourcetype, FORMAT = sourcetype::target_sourcetype. This transform simply switches the event's sourcetype.

It's important that props.conf not rely on source:: stanzas to process the events, because that stanza would apply to the cloned events as well as to the original event, resulting in multiple applications of SEDCMD and TRANSFORMS. That is unlikely to yield the desired results except in really odd circumstances.

A note of caution: keep backups of inputs.conf, props.conf and transforms.conf appearing in the universal forwarder's /opt/splunkforwarder/etc/apps/_server_app_<server_class>/local because Splunk Web will wipe them when you change the input configuration—if you edit directly in that folder. The workaround is to build your inputs.conf, props.conf, and transforms.conf in /opt/splunk/etc/deployment-apps/_server_app_<server_class>/local on the main Splunk instance. The only caveat I've had with this is that there is seemingly no way to "refresh" the forwarder from Splunk Web (you'd expect that in Settings: (Distributed environment) Forwarder management); you must use the command line and issue splunk reload deploy-server.

0 Karma

micahkemp
Champion

Just out of curiosity, can you explain why you need value1 and value2 on separate lines?

0 Karma

DUThibault
Contributor

@micahkemp It's a conversion thing. collectd in csv mode produces e.g. load events consisting of <timestamp>,<shortterm>,<midterm>,<longterm>, but in graphite mode it produces three separate events ( <host.plugin.metric> <shortterm> <timestamp>, <host.plugin.metric> <midterm> <timestamp>, <host.plugin.metric> <longterm> <timestamp>). I'm converting the former into the latter so that what the universal forwarder sends to the indexer is in the latter format (and bears the latter sourcetype).

I would obviously not need to do this splitting/conversion if I could use a linux:collectd:csv sourcetype of my own, but the intent here is to be able to feed the Splunk_TA_linux (3412) application transparently.

0 Karma

DUThibault
Contributor

Oh boy. Testing this out yields really unexpected results.

props.conf

[source::/home/user/testidle]
force_local_processing = true
SEDCMD-a = s/$/ seda/
TRANSFORMS-clone = testidle-clone, testidle-b
sourcetype = whatever

[testidle-cloned]
force_local_processing = true
SEDCMD-b = s/$/ sedb/
TRANSFORMS-switch-sourcetype = testidle-c, switch-sourcetype

transforms.conf

[testidle-clone]
CLONE_SOURCETYPE = testidle-cloned
REGEX = .*
DEST_KEY = _raw
FORMAT = $0 testidle-clone

[testidle-b]
REGEX = .*
DEST_KEY = _raw
FORMAT = $0 testidle-b

[testidle-c]
REGEX = .*
DEST_KEY = _raw
FORMAT = $0 testidle-c

[switch-sourcetype]
SOURCE_KEY = MetaData:Sourcetype
REGEX = .*
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::whatever

The events that the forwarder sends are then:

eventdata seda testidle-b

and

eventdata seda testidle-clone seda sedb testidle-clone testidle-b testidle-c

The first event's track is as expected: the source:: applies a SEDCMD and then the transforms testidle-clone (which creates the clone) and testidle-b. It's the second event which reveals weirdness. It looks like it's not the original raw event that is cloned, but the SEDCMDed one ("eventdata seda") at which point the original sourcetype's SEDCMD gets applied again before the new sourcetype's SEDCMD gets applied followed by the original sourcetype's TRANSFORMS (clone then b) and finally the new sourcetype's TRANSFORMS (c then switch-sourcetype). Thankfully, switch-sourcetype works as expected (and I checked: if I omit the switch-sourcetype, the cloned event comes through with the same payload, albeit under the testidle-cloned sourcetype.

The fact that the event gets cloned twice is, by itself, a bug, methinks.

0 Karma

DUThibault
Contributor

Addendum: The above behaviour seems due to the source:: matching (the SPL-99120 bug). Since the cloned event has the same source, it gets matched against source:: and this is why the SEDCMD and cloning occur a second time. If we map the input file directly to a sourcetype, then we get:

eventdata seda testidle-b

and

eventdata seda testidle-clone sedb testidle-c

which was the original intent.

0 Karma

maciep
Champion

do you need them to be the same sourcetype when indexed? or can you rename the new sourcetype back at search time? the reason i ask is because the docs make it sound like the cloned event will only get index-time transformations and sed commands. meaning, it might not go back through the rest of the typing queue? maybe to prevent looping? or maybe i'm reading it wrong.

props.conf

[clone_test]
TRANSFORMS-clone = clone_sourcetype,trim

[clone_test2]
# rename is done at search time
rename = clone_test

transforms.conf

[trim]
REGEX = ^([^,]+,[^,]+)
FORMAT = $1
DEST_KEY = _raw

[clone_sourcetype]
REGEX = ^([^,]+),[^,]+,(.+)
FORMAT = $1,$2
DEST_KEY = _raw
CLONE_SOURCETYPE = clone_test2
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...