Getting Data In

Do TRANSFORMS in a source stanza and a sourcetype stanza both apply?

DUThibault
Contributor

I am thinking of merging a variety of sources being monitored by a Universal Forwarder into a single sourcetype for indexing (and later searching) purposes. The sources each have specific pre-processing that needs to be done, and then a bunch of common processing that I can assign to the sourcetype .

Suppose I have a [source::<source_spec>] stanza that specifies a number of TRANSFORMS clauses and a sourcetype = <common_sourcetype> clause, and also a [<common_sourcetype>] stanza with its own TRANSFORMS clauses. Will the source have both sets of TRANSFORMS applied? Or will the first set be ignored because the sourcetype clause "overrides" it?

If I have a force_local_processing = true clause in the sourcetype stanza, will the Universal Forwarder also process the search-time REPORT and EXTRACT clauses? The FIELDALIAS, EVAL, LOOKUP clauses? I suspect no on both counts.

I know SEDCMD clauses are applied at index-time, but are they applied before TRANSFORMS? Is the order in which they appear in a stanza significant?

0 Karma
1 Solution

DUThibault
Contributor

The simple answer is: yes. All of the matching stanzas will apply, merging the various clauses.

Suppose I have a [source::<source_spec>] stanza that specifies a number of TRANSFORMS clauses
and a sourcetype = <common_sourcetype> clause, and also a [<common_sourcetype>] stanza with
its own TRANSFORMS clauses. Will the source have both sets of TRANSFORMS applied? Or will the
first set be ignored because the sourcetype clause "overrides" it?

Both sets of TRANSFORMS will apply.

If I have a force_local_processing = true clause in the sourcetype stanza, will the Universal
Forwarder also process the search-time REPORT and EXTRACT clauses? The FIELDALIAS, EVAL,
LOOKUP clauses? I suspect no on both counts.

No, a Universal Forwarder will never intervene past index time, so any "local processing" REPORT, EXTRACT, FIELDALIAS, etc., will be ignored.

I know SEDCMD clauses are applied at index-time, but are they applied before TRANSFORMS?
Is the order in which they appear in a stanza significant?

SEDCMD clauses apply first; however, if a TRANSFORMS clause then changes the sourcetype, the new sourcetype's SEDCMD clause would be applied once the sourcetype-changing TRANSFORMS clause is complete.

View solution in original post

0 Karma

DUThibault
Contributor

The simple answer is: yes. All of the matching stanzas will apply, merging the various clauses.

Suppose I have a [source::<source_spec>] stanza that specifies a number of TRANSFORMS clauses
and a sourcetype = <common_sourcetype> clause, and also a [<common_sourcetype>] stanza with
its own TRANSFORMS clauses. Will the source have both sets of TRANSFORMS applied? Or will the
first set be ignored because the sourcetype clause "overrides" it?

Both sets of TRANSFORMS will apply.

If I have a force_local_processing = true clause in the sourcetype stanza, will the Universal
Forwarder also process the search-time REPORT and EXTRACT clauses? The FIELDALIAS, EVAL,
LOOKUP clauses? I suspect no on both counts.

No, a Universal Forwarder will never intervene past index time, so any "local processing" REPORT, EXTRACT, FIELDALIAS, etc., will be ignored.

I know SEDCMD clauses are applied at index-time, but are they applied before TRANSFORMS?
Is the order in which they appear in a stanza significant?

SEDCMD clauses apply first; however, if a TRANSFORMS clause then changes the sourcetype, the new sourcetype's SEDCMD clause would be applied once the sourcetype-changing TRANSFORMS clause is complete.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi DUThibault,
why do you want to use the same sourcetype for different sources?
Related to sourcetype there are all the knowledge objects you have.

For needs like the ones you described, I use a correct sourcetype for each kind of source and then I aggregate them using eventtypes and tags.
In other words: I ingest audit logs from many different sources using an own sourcetype for each one.
Then I create three eventtypes for each one filtering audit events, addressing the same tags: LOGIN, LOGOUT and LOGFAIL.
In this way, calling tag=LOGIN I can find all the login events from many different kind of sources.

I understand that this isn't a direct answer to your question, but I wanted to share my experience on this problem.

Bye.
Giuseppe

0 Karma

DUThibault
Contributor

Solution in progress:

inputs.conf (on the Universal Forwarder)

# cpu-<number>/cpu-(idle|interrupt|nice|softirq|steal|system|user|wait)-<timestamp>
[monitor:///var/collectd/csv/*/cpu-*/cpu-*]
disabled = false
index = forwarders_index

props.conf (on the Universal Forwarder)

# .../csv/<host>.<domain>/cpu-<number>/cpu-(idle|interrupt|nice|softirq|steal|system|user|wait)-<timestamp>
[source::.../csv/host.domain/cpu-*/cpu-*]
# epoch,value
# 1516683601,362505306
force_local_processing = true
# <host>.cpu-<number>.cpu-(idle|interrupt|nice|softirq|steal|system|user|wait).value \2 \1
SEDCMD-swap = s/^(\d+),(\d+)/\2 \1/
TRANSFORMS-skipheader = transform-skipheader-epoch-value
TRANSFORMS-build-raw = transform-cpu-prefix
sourcetype = linux:collectd:graphite

transforms.conf (on the Universal Forwarder)

[transform-skipheader-epoch-value]
REGEX = epoch,value
DEST_KEY = queue
FORMAT = nullQueue

[transform-cpu-prefix]
SOURCE_KEY = MetaData:Source
# .../csv/<host>.<domain>/cpu-<number>/cpu-(idle|interrupt|nice|softirq|steal|system|user|wait)-<timestamp>
REGEX = ^.*/csv/([^./]+)[^/]*/cpu-([0-9]+)/cpu-([a-z]+)-[0-9]{4}-[0-9]{2}-[0-9]{2}$
DEST_KEY = _raw
# <host>.cpu-<number>.cpu-(idle|interrupt|nice|softirq|steal|system|user|wait).value \1 \2
FORMAT = $1.cpu-$2.cpu-$3.value $0

The same approach applies to the other categories of collectd data.

And my Splunk instance receives linux:collectd:graphite events formatted just like they should be.

0 Karma

DUThibault
Contributor

See https://answers.splunk.com/answers/615924/ for the rest of the solution.

0 Karma

DUThibault
Contributor

Specifically, I'm trying to get additional data into Splunk_TA_linux. That app expects the sourcetypes linux:collectd:http:json and linux:collectd:graphite. Because those two channels are unavailable on the old system I'm running collectd on, I configured collectd to write log-like csv files instead, and I use a universal forwarder to watch those files and send them to my Splunk instance. Each of the collectd categories of logs produces a different stream of events which require a little bit of specialised transforms, but once these "preambles" are done there is a common set of transforms that remains to do. That's why I want to funnel a bunch of sourcetypes into a single one, which would mimic linux:collectd:graphite as far as values and metadata go. Once that is achieved, a final manipulation of the MetaData:Sourcetype key would relabel the events as linux:collectd:graphite and I'd be done.

0 Karma

sbbadri
Motivator

@DUThibault

I hope understand your problem correctly. You have three different sources and need to perform own transforms . After that combine all the three sources and do the common transforms. I hope below thing would be helpful.

[inux:collectd:http:json]
TRANSFORMS-a = some_transform_1

[inux:collectd:graphite]
TRANSFORMS-b = some_transform_2

[collectd]
TRANSFORMS-c = some_transform_3

[inux:collectd:http:json]
rename = commoncollectd

[inux:collectd:graphite]
rename = commmoncollectd

[collectd]
rename = commmoncollectd

[commoncollectd]
Tranforms-common = some_transforms_common

0 Karma

davpx
Communicator

source:: has precedence over sourcetype so the sourcetype stanza would be ignored.

https://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Attributeprecedencewithinafile

mwk1000
Path Finder

Yes, remember the ONE PASS rule an even goes thru ONCE based on highest precedence , you can CLONE_SOURCETYPE to create another event and have a second go .... 

0 Karma

DUThibault
Contributor

You're not understanding my question. Say I have:

[source::<some path>]
TRANSFORMS-a = some_transform
sourcetype = <some sourcetype>

[<some_sourcetype>]
TRANSFORMS-b = some_other_transform

I'm hoping the events from <some_path> will undergo TRANSFORMS-a, receive the sourcetype <some_sourcetype> and then (consequently) undergo TRANSFORMS-b.

0 Karma

mwk1000
Path Finder

One pass rule - NO it will apply the source:: if it matches ... 

0 Karma

davpx
Communicator

No, you can only go through the parsing phase once. The first will apply and the second will never match anything.

0 Karma

DUThibault
Contributor

But if I look at etc/system/default/props.conf, there is a [source::.../syslog(.\d+)?] with a sourcetype = syslog clause. Elsewhere in the file we find a [syslog] stanza. Why is this if it'll never be matched?

Are you saying that source:: can chain to sourcetype but that only the first TRANSFORMS clause present in either one gets to run?

0 Karma

davpx
Communicator

No they cannot be chained, You can modify the sourcetype later on in transforms and you can apply more than one transform to a props stanza by listing them out comma separated.

When it comes to props, you can only match once. The example where [source::.../syslog(.\d+)?] implicitly matches any source where the file ends in a number. Those sources will take only this path through the parsing phase. Anything else already having the sourcetype of syslog via inputs will match the other stanza.

0 Karma

DUThibault
Contributor

Let's take another look at etc/system/default/props.conf. There is a [syslog] stanza with nine clauses ( pulldown_type, maxDist, TIME_FORMAT, MAX_TIMESTAMP_LOOKAHEAD, TRANSFORMS, REPORT-syslog, SHOULD_LINEMERGE, category, description), and there are four source:: stanzas ( .../messages(.\d+)?, .../syslog(.\d+)?, two more in .../private/var/log) that consist of just the sourcetype = syslog clause. If I declare syslog(.\d+)? files as inputs without setting their sourcetype (in inputs.conf), the source:: stanza will match and all it will do is set the sourcetype. None of the TIME_FORMAT, etc. clauses will be applied. If, on the other hand, I do set the sourcetypeof syslog(.\d+)? when I declare them as inputs, the sourcetype stanza will kick in and the various clauses will apply. Either that or the source:: stanza will take precedence and we're back to the previous case. Additionally, the only way to get the sourcetype stanza to kick in would then be to have an input that does not match any source:: stanza but which is assigned that sourcetype manually. Doesn't sound right.

Looking at https://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Propsconf it is very clear that props.conf does multiple matches: all source::, sourcetype, host::, rule::, and delayedrule:: stanzas that match are applied. The props.conf page explains in detail how conflicting clauses will be resolved both across stanza types and within a single stanza type. Further, delayedrule:: stanzas make sense only if, when they are triggered and an input receives a sourcetype as a consequence, the sourcetype stanza is then looked up and applied. That's what I mean by chaining.

Please explain?

0 Karma

davpx
Communicator

"Additionally, the only way to get the sourcetype stanza to kick in would then be to have an input that does not match any source:: stanza but which is assigned that sourcetype manually. Doesn't sound right." - Yes

Re - multiple matching categories, this is where precedence kicks in. They are not all applied, they are overridden as you only get one pass through props.conf, meaning you cannot use props.conf to set a sourcetype from a source:: spec and expect that same data to be evaluated again at index time via the sourcetype stanza because there is no second pass through the parsing phase.

**[<spec>] stanza precedence:**

For settings that are specified in multiple categories of matching [<spec>]
stanzas, [host::<host>] settings override [<sourcetype>] settings.
Additionally, [source::<source>] settings override both [host::<host>]
and [<sourcetype>] settings.
0 Karma

DUThibault
Contributor

When I read the props.conf page, my understanding is that clauses override each other, not stanzas. You seem to be saying that a source:: clause that sets sourcetype does not trigger the sourcetype stanza clauses (within the same, single props.conf parsing pass), which gets us back to the scenario I described where the particular way a file is inputted changes completely how it gets indexed and searched. What is the point of using a source:: stanza to set sourcetype if that gets completely ignored? Or even if the sourcetype kicks in only at search time, leaving the index-time clauses (TRANSFORMS) high and dry?

0 Karma

DUThibault
Contributor

I've spent the day testing Splunk 7.0.2 step by step, and here's what I found:

1) (parsing time)

[source::] matching occurs whether or not sourcetype is specified in inputs.conf. If the input's sourcetype is set only by a [source::] stanza, the [sourcetype] stanza nevertheless also fires.

This happens on a Universal Forwarder (UF) if it has a props.conf (absent by default) and the [source::] or [sourcetype] stanzas have a force_local_processing = true clause (the props.conf page is incorrect when it states a force_local_processing clause can only appear in a [sourcetype] stanza: it also works with a [source::] stanza). Parsing occurs only once, in the sense that if the UF parses and indexes the data, the indexer won't.

2) (indexing time)

SEDCMD and TRANSFORMSclauses fire at this time (in that order). For a TRANSFORMS to have any effect, it must have a WRITE_META = true or DEST_KEY = _meta clause. This happens on a UF under the conditions outlined above, preventing the indexer clauses from firing.

3) (search time)

A [sourcetype] rename clause kicks in first. Then any REPORT and EXTRACT clauses fire. Note that REPORT and EXTRACT can never occur on a UF.

To be complete, EXTRACT happens first, then REPORT, then automatic key-value extraction, then FIELDALIAS, then EVAL (in parallel), and finally LOOKUP.

The rules of precedence (e.g. [source::] overrides [sourcetype]) matter only if the clauses have the same classes. That is to say, if they have identical keys. Thus, a [source::] SEDCMD-one clause and a [sourcetype] SEDCMD-two clause would both fire (in one, two order, because they're sorted using class).

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...