In a nutshell, I need CLONE_SOURCETYPE
functionality within a single sourcetype
. I have events (from a [source::]
stanza) that come in the form <timestamp>,<value1>,<value2>
on a Universal Forwarder and I need it to split this into two events <timestamp>,<value1>
and <timestamp>,<value2>
which will be both sent to the same sourcetype
.
How can this be done?
I am speculating that I could CLONE_SOURCETYPE
and have the new sourcetype
apply a couple of TRANSFORMS
to change the (duplicated) _raw
and its MetaData:Sourcetype
(back to the original destination sourcetype
). Is there a simpler way?
To recap, the problem is that we have a source whose events need to be split and end up in a certain target format. In this particular case, this is done on a universal forwarder, but the solution applies to a source local to a Splunk indexer too.
1) In inputs.conf
, identify the sourcetype as intermediate_sourcetype_1
.
2) In props.conf
(with force_local_processing = true
), assign to intermediate_sourcetype_1
any common _raw
transformation to a SEDCMD
, then specify three TRANSFORMS
which we'll call transform-clone-1, transform-clone-2, transform-drop
.
3) In transforms.conf
, define transform-drop
as REGEX = .*
, DEST_KEY = queue
, FORMAT = nullQueue
. This transform simply drops the event.
4) Still in transforms.conf
, define transform-clone-1
as REGEX = .*
, DEST_KEY = _raw
, FORMAT = $0
, CLONE_SOURCETYPE = intermediate_sourcetype_2A
. transform-clone-2
is the same except that CLONE_SOURCETYPE = intermediate_sourcetype_2B
.
5) Back in props.conf
, assign appropriate SEDCMD
and TRANSFORMS
to intermediate_sourcetype_2A
and intermediate_sourcetype_2B
. Make sure to conclude the TRANSFORMS
set with transform-switch-sourcetype
.
6) Finally, back in transforms.conf
, define transform-switch-sourcetype
as SOURCE_KEY = MetaData:Sourcetype
, REGEX = .*
, DEST_KEY = MetaData:Sourcetype
, FORMAT = sourcetype::target_sourcetype
. This transform simply switches the event's sourcetype.
It's important that props.conf
not rely on source::
stanzas to process the events, because that stanza would apply to the cloned events as well as to the original event, resulting in multiple applications of SEDCMD
and TRANSFORMS
. That is unlikely to yield the desired results except in really odd circumstances.
A note of caution: keep backups of inputs.conf
, props.conf
and transforms.conf
appearing in the universal forwarder's /opt/splunkforwarder/etc/apps/_server_app_<server_class>/local
because Splunk Web will wipe them when you change the input configuration—if you edit directly in that folder. The workaround is to build your inputs.conf
, props.conf
, and transforms.conf
in /opt/splunk/etc/deployment-apps/_server_app_<server_class>/local
on the main Splunk instance. The only caveat I've had with this is that there is seemingly no way to "refresh" the forwarder from Splunk Web (you'd expect that in Settings: (Distributed environment) Forwarder management
); you must use the command line and issue splunk reload deploy-server
.
To recap, the problem is that we have a source whose events need to be split and end up in a certain target format. In this particular case, this is done on a universal forwarder, but the solution applies to a source local to a Splunk indexer too.
1) In inputs.conf
, identify the sourcetype as intermediate_sourcetype_1
.
2) In props.conf
(with force_local_processing = true
), assign to intermediate_sourcetype_1
any common _raw
transformation to a SEDCMD
, then specify three TRANSFORMS
which we'll call transform-clone-1, transform-clone-2, transform-drop
.
3) In transforms.conf
, define transform-drop
as REGEX = .*
, DEST_KEY = queue
, FORMAT = nullQueue
. This transform simply drops the event.
4) Still in transforms.conf
, define transform-clone-1
as REGEX = .*
, DEST_KEY = _raw
, FORMAT = $0
, CLONE_SOURCETYPE = intermediate_sourcetype_2A
. transform-clone-2
is the same except that CLONE_SOURCETYPE = intermediate_sourcetype_2B
.
5) Back in props.conf
, assign appropriate SEDCMD
and TRANSFORMS
to intermediate_sourcetype_2A
and intermediate_sourcetype_2B
. Make sure to conclude the TRANSFORMS
set with transform-switch-sourcetype
.
6) Finally, back in transforms.conf
, define transform-switch-sourcetype
as SOURCE_KEY = MetaData:Sourcetype
, REGEX = .*
, DEST_KEY = MetaData:Sourcetype
, FORMAT = sourcetype::target_sourcetype
. This transform simply switches the event's sourcetype.
It's important that props.conf
not rely on source::
stanzas to process the events, because that stanza would apply to the cloned events as well as to the original event, resulting in multiple applications of SEDCMD
and TRANSFORMS
. That is unlikely to yield the desired results except in really odd circumstances.
A note of caution: keep backups of inputs.conf
, props.conf
and transforms.conf
appearing in the universal forwarder's /opt/splunkforwarder/etc/apps/_server_app_<server_class>/local
because Splunk Web will wipe them when you change the input configuration—if you edit directly in that folder. The workaround is to build your inputs.conf
, props.conf
, and transforms.conf
in /opt/splunk/etc/deployment-apps/_server_app_<server_class>/local
on the main Splunk instance. The only caveat I've had with this is that there is seemingly no way to "refresh" the forwarder from Splunk Web (you'd expect that in Settings: (Distributed environment) Forwarder management
); you must use the command line and issue splunk reload deploy-server
.
Just out of curiosity, can you explain why you need value1
and value2
on separate lines?
@micahkemp It's a conversion thing. collectd
in csv
mode produces e.g. load
events consisting of <timestamp>,<shortterm>,<midterm>,<longterm>
, but in graphite
mode it produces three separate events ( <host.plugin.metric> <shortterm> <timestamp>
, <host.plugin.metric> <midterm> <timestamp>
, <host.plugin.metric> <longterm> <timestamp>
). I'm converting the former into the latter so that what the universal forwarder sends to the indexer is in the latter format (and bears the latter sourcetype).
I would obviously not need to do this splitting/conversion if I could use a linux:collectd:csv
sourcetype of my own, but the intent here is to be able to feed the Splunk_TA_linux
(3412) application transparently.
Oh boy. Testing this out yields really unexpected results.
props.conf
[source::/home/user/testidle]
force_local_processing = true
SEDCMD-a = s/$/ seda/
TRANSFORMS-clone = testidle-clone, testidle-b
sourcetype = whatever
[testidle-cloned]
force_local_processing = true
SEDCMD-b = s/$/ sedb/
TRANSFORMS-switch-sourcetype = testidle-c, switch-sourcetype
transforms.conf
[testidle-clone]
CLONE_SOURCETYPE = testidle-cloned
REGEX = .*
DEST_KEY = _raw
FORMAT = $0 testidle-clone
[testidle-b]
REGEX = .*
DEST_KEY = _raw
FORMAT = $0 testidle-b
[testidle-c]
REGEX = .*
DEST_KEY = _raw
FORMAT = $0 testidle-c
[switch-sourcetype]
SOURCE_KEY = MetaData:Sourcetype
REGEX = .*
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::whatever
The events that the forwarder sends are then:
eventdata seda testidle-b
and
eventdata seda testidle-clone seda sedb testidle-clone testidle-b testidle-c
The first event's track is as expected: the source::
applies a SEDCMD
and then the transforms testidle-clone
(which creates the clone) and testidle-b
. It's the second event which reveals weirdness. It looks like it's not the original raw event that is cloned, but the SEDCMDed one ("eventdata seda") at which point the original sourcetype's SEDCMD
gets applied again before the new sourcetype's SEDCMD
gets applied followed by the original sourcetype's TRANSFORMS
(clone
then b
) and finally the new sourcetype's TRANSFORMS
(c
then switch-sourcetype
). Thankfully, switch-sourcetype
works as expected (and I checked: if I omit the switch-sourcetype
, the cloned event comes through with the same payload, albeit under the testidle-cloned
sourcetype.
The fact that the event gets cloned twice is, by itself, a bug, methinks.
Addendum: The above behaviour seems due to the source::
matching (the SPL-99120 bug). Since the cloned event has the same source, it gets matched against source::
and this is why the SEDCMD
and cloning occur a second time. If we map the input file directly to a sourcetype, then we get:
eventdata seda testidle-b
and
eventdata seda testidle-clone sedb testidle-c
which was the original intent.
do you need them to be the same sourcetype when indexed? or can you rename the new sourcetype back at search time? the reason i ask is because the docs make it sound like the cloned event will only get index-time transformations and sed commands. meaning, it might not go back through the rest of the typing queue? maybe to prevent looping? or maybe i'm reading it wrong.
props.conf
[clone_test]
TRANSFORMS-clone = clone_sourcetype,trim
[clone_test2]
# rename is done at search time
rename = clone_test
transforms.conf
[trim]
REGEX = ^([^,]+,[^,]+)
FORMAT = $1
DEST_KEY = _raw
[clone_sourcetype]
REGEX = ^([^,]+),[^,]+,(.+)
FORMAT = $1,$2
DEST_KEY = _raw
CLONE_SOURCETYPE = clone_test2