I am thinking of merging a variety of sources being monitored by a Universal Forwarder into a single sourcetype
for indexing (and later searching) purposes. The sources each have specific pre-processing that needs to be done, and then a bunch of common processing that I can assign to the sourcetype
.
Suppose I have a [source::<source_spec>]
stanza that specifies a number of TRANSFORMS
clauses and a sourcetype = <common_sourcetype>
clause, and also a [<common_sourcetype>]
stanza with its own TRANSFORMS
clauses. Will the source have both sets of TRANSFORMS
applied? Or will the first set be ignored because the sourcetype
clause "overrides" it?
If I have a force_local_processing = true
clause in the sourcetype
stanza, will the Universal Forwarder also process the search-time REPORT
and EXTRACT
clauses? The FIELDALIAS
, EVAL
, LOOKUP
clauses? I suspect no on both counts.
I know SEDCMD
clauses are applied at index-time, but are they applied before TRANSFORMS
? Is the order in which they appear in a stanza significant?
The simple answer is: yes. All of the matching stanzas will apply, merging the various clauses.
Suppose I have a
[source::<source_spec>]
stanza that specifies a number ofTRANSFORMS
clauses
and asourcetype = <common_sourcetype>
clause, and also a[<common_sourcetype>]
stanza with
its ownTRANSFORMS
clauses. Will the source have both sets ofTRANSFORMS
applied? Or will the
first set be ignored because the sourcetype clause "overrides" it?
Both sets of TRANSFORMS
will apply.
If I have a
force_local_processing = true
clause in thesourcetype
stanza, will the Universal
Forwarder also process the search-timeREPORT
andEXTRACT
clauses? TheFIELDALIAS
,EVAL
,
LOOKUP
clauses? I suspect no on both counts.
No, a Universal Forwarder will never intervene past index time, so any "local processing" REPORT
, EXTRACT
, FIELDALIAS
, etc., will be ignored.
I know
SEDCMD
clauses are applied at index-time, but are they applied beforeTRANSFORMS
?
Is the order in which they appear in a stanza significant?
SEDCMD
clauses apply first; however, if a TRANSFORMS
clause then changes the sourcetype
, the new sourcetype's SEDCMD
clause would be applied once the sourcetype-changing TRANSFORMS
clause is complete.
The simple answer is: yes. All of the matching stanzas will apply, merging the various clauses.
Suppose I have a
[source::<source_spec>]
stanza that specifies a number ofTRANSFORMS
clauses
and asourcetype = <common_sourcetype>
clause, and also a[<common_sourcetype>]
stanza with
its ownTRANSFORMS
clauses. Will the source have both sets ofTRANSFORMS
applied? Or will the
first set be ignored because the sourcetype clause "overrides" it?
Both sets of TRANSFORMS
will apply.
If I have a
force_local_processing = true
clause in thesourcetype
stanza, will the Universal
Forwarder also process the search-timeREPORT
andEXTRACT
clauses? TheFIELDALIAS
,EVAL
,
LOOKUP
clauses? I suspect no on both counts.
No, a Universal Forwarder will never intervene past index time, so any "local processing" REPORT
, EXTRACT
, FIELDALIAS
, etc., will be ignored.
I know
SEDCMD
clauses are applied at index-time, but are they applied beforeTRANSFORMS
?
Is the order in which they appear in a stanza significant?
SEDCMD
clauses apply first; however, if a TRANSFORMS
clause then changes the sourcetype
, the new sourcetype's SEDCMD
clause would be applied once the sourcetype-changing TRANSFORMS
clause is complete.
Hi DUThibault,
why do you want to use the same sourcetype for different sources?
Related to sourcetype there are all the knowledge objects you have.
For needs like the ones you described, I use a correct sourcetype for each kind of source and then I aggregate them using eventtypes and tags.
In other words: I ingest audit logs from many different sources using an own sourcetype for each one.
Then I create three eventtypes for each one filtering audit events, addressing the same tags: LOGIN, LOGOUT and LOGFAIL.
In this way, calling tag=LOGIN I can find all the login events from many different kind of sources.
I understand that this isn't a direct answer to your question, but I wanted to share my experience on this problem.
Bye.
Giuseppe
Solution in progress:
inputs.conf (on the Universal Forwarder)
# cpu-<number>/cpu-(idle|interrupt|nice|softirq|steal|system|user|wait)-<timestamp>
[monitor:///var/collectd/csv/*/cpu-*/cpu-*]
disabled = false
index = forwarders_index
props.conf (on the Universal Forwarder)
# .../csv/<host>.<domain>/cpu-<number>/cpu-(idle|interrupt|nice|softirq|steal|system|user|wait)-<timestamp>
[source::.../csv/host.domain/cpu-*/cpu-*]
# epoch,value
# 1516683601,362505306
force_local_processing = true
# <host>.cpu-<number>.cpu-(idle|interrupt|nice|softirq|steal|system|user|wait).value \2 \1
SEDCMD-swap = s/^(\d+),(\d+)/\2 \1/
TRANSFORMS-skipheader = transform-skipheader-epoch-value
TRANSFORMS-build-raw = transform-cpu-prefix
sourcetype = linux:collectd:graphite
transforms.conf (on the Universal Forwarder)
[transform-skipheader-epoch-value]
REGEX = epoch,value
DEST_KEY = queue
FORMAT = nullQueue
[transform-cpu-prefix]
SOURCE_KEY = MetaData:Source
# .../csv/<host>.<domain>/cpu-<number>/cpu-(idle|interrupt|nice|softirq|steal|system|user|wait)-<timestamp>
REGEX = ^.*/csv/([^./]+)[^/]*/cpu-([0-9]+)/cpu-([a-z]+)-[0-9]{4}-[0-9]{2}-[0-9]{2}$
DEST_KEY = _raw
# <host>.cpu-<number>.cpu-(idle|interrupt|nice|softirq|steal|system|user|wait).value \1 \2
FORMAT = $1.cpu-$2.cpu-$3.value $0
The same approach applies to the other categories of collectd data.
And my Splunk instance receives linux:collectd:graphite
events formatted just like they should be.
See https://answers.splunk.com/answers/615924/ for the rest of the solution.
Specifically, I'm trying to get additional data into Splunk_TA_linux. That app expects the sourcetypes linux:collectd:http:json and linux:collectd:graphite. Because those two channels are unavailable on the old system I'm running collectd on, I configured collectd to write log-like csv files instead, and I use a universal forwarder to watch those files and send them to my Splunk instance. Each of the collectd categories of logs produces a different stream of events which require a little bit of specialised transforms, but once these "preambles" are done there is a common set of transforms that remains to do. That's why I want to funnel a bunch of sourcetypes into a single one, which would mimic linux:collectd:graphite as far as values and metadata go. Once that is achieved, a final manipulation of the MetaData:Sourcetype key would relabel the events as linux:collectd:graphite and I'd be done.
I hope understand your problem correctly. You have three different sources and need to perform own transforms . After that combine all the three sources and do the common transforms. I hope below thing would be helpful.
[inux:collectd:http:json]
TRANSFORMS-a = some_transform_1
[inux:collectd:graphite]
TRANSFORMS-b = some_transform_2
[collectd]
TRANSFORMS-c = some_transform_3
[inux:collectd:http:json]
rename = commoncollectd
[inux:collectd:graphite]
rename = commmoncollectd
[collectd]
rename = commmoncollectd
[commoncollectd]
Tranforms-common = some_transforms_common
source:: has precedence over sourcetype so the sourcetype stanza would be ignored.
https://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Attributeprecedencewithinafile
Yes, remember the ONE PASS rule an even goes thru ONCE based on highest precedence , you can CLONE_SOURCETYPE to create another event and have a second go ....
You're not understanding my question. Say I have:
[source::<some path>]
TRANSFORMS-a = some_transform
sourcetype = <some sourcetype>
[<some_sourcetype>]
TRANSFORMS-b = some_other_transform
I'm hoping the events from <some_path>
will undergo TRANSFORMS-a
, receive the sourcetype <some_sourcetype>
and then (consequently) undergo TRANSFORMS-b
.
One pass rule - NO it will apply the source:: if it matches ...
No, you can only go through the parsing phase once. The first will apply and the second will never match anything.
But if I look at etc/system/default/props.conf
, there is a [source::.../syslog(.\d+)?]
with a sourcetype = syslog
clause. Elsewhere in the file we find a [syslog]
stanza. Why is this if it'll never be matched?
Are you saying that source::
can chain to sourcetype
but that only the first TRANSFORMS
clause present in either one gets to run?
No they cannot be chained, You can modify the sourcetype later on in transforms and you can apply more than one transform to a props stanza by listing them out comma separated.
When it comes to props, you can only match once. The example where [source::.../syslog(.\d+)?] implicitly matches any source where the file ends in a number. Those sources will take only this path through the parsing phase. Anything else already having the sourcetype of syslog via inputs will match the other stanza.
Let's take another look at etc/system/default/props.conf
. There is a [syslog]
stanza with nine clauses ( pulldown_type
, maxDist
, TIME_FORMAT
, MAX_TIMESTAMP_LOOKAHEAD
, TRANSFORMS
, REPORT-syslog
, SHOULD_LINEMERGE
, category
, description
), and there are four source::
stanzas ( .../messages(.\d+)?
, .../syslog(.\d+)?
, two more in .../private/var/log
) that consist of just the sourcetype = syslog
clause. If I declare syslog(.\d+)?
files as inputs without setting their sourcetype
(in inputs.conf
), the source::
stanza will match and all it will do is set the sourcetype
. None of the TIME_FORMAT
, etc. clauses will be applied. If, on the other hand, I do set the sourcetype
of syslog(.\d+)?
when I declare them as inputs, the sourcetype
stanza will kick in and the various clauses will apply. Either that or the source::
stanza will take precedence and we're back to the previous case. Additionally, the only way to get the sourcetype
stanza to kick in would then be to have an input that does not match any source::
stanza but which is assigned that sourcetype
manually. Doesn't sound right.
Looking at https://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Propsconf it is very clear that props.conf
does multiple matches: all source::
, sourcetype
, host::
, rule::
, and delayedrule::
stanzas that match are applied. The props.conf
page explains in detail how conflicting clauses will be resolved both across stanza types and within a single stanza type. Further, delayedrule::
stanzas make sense only if, when they are triggered and an input receives a sourcetype
as a consequence, the sourcetype
stanza is then looked up and applied. That's what I mean by chaining.
Please explain?
"Additionally, the only way to get the sourcetype stanza to kick in would then be to have an input that does not match any source:: stanza but which is assigned that sourcetype manually. Doesn't sound right." - Yes
Re - multiple matching categories, this is where precedence kicks in. They are not all applied, they are overridden as you only get one pass through props.conf, meaning you cannot use props.conf to set a sourcetype from a source:: spec and expect that same data to be evaluated again at index time via the sourcetype stanza because there is no second pass through the parsing phase.
**[<spec>] stanza precedence:**
For settings that are specified in multiple categories of matching [<spec>]
stanzas, [host::<host>] settings override [<sourcetype>] settings.
Additionally, [source::<source>] settings override both [host::<host>]
and [<sourcetype>] settings.
When I read the props.conf page, my understanding is that clauses override each other, not stanzas. You seem to be saying that a source:: clause that sets sourcetype does not trigger the sourcetype stanza clauses (within the same, single props.conf parsing pass), which gets us back to the scenario I described where the particular way a file is inputted changes completely how it gets indexed and searched. What is the point of using a source:: stanza to set sourcetype if that gets completely ignored? Or even if the sourcetype kicks in only at search time, leaving the index-time clauses (TRANSFORMS) high and dry?
I've spent the day testing Splunk 7.0.2 step by step, and here's what I found:
1) (parsing time)
[source::]
matching occurs whether or not sourcetype
is specified in inputs.conf
. If the input's sourcetype
is set only by a [source::]
stanza, the [sourcetype]
stanza nevertheless also fires.
This happens on a Universal Forwarder (UF) if it has a props.conf
(absent by default) and the [source::]
or [sourcetype]
stanzas have a force_local_processing = true
clause (the props.conf page is incorrect when it states a force_local_processing
clause can only appear in a [sourcetype]
stanza: it also works with a [source::]
stanza). Parsing occurs only once, in the sense that if the UF parses and indexes the data, the indexer won't.
2) (indexing time)
SEDCMD
and TRANSFORMS
clauses fire at this time (in that order). For a TRANSFORMS
to have any effect, it must have a WRITE_META = true
or DEST_KEY = _meta
clause. This happens on a UF under the conditions outlined above, preventing the indexer clauses from firing.
3) (search time)
A [sourcetype] rename
clause kicks in first. Then any REPORT
and EXTRACT
clauses fire. Note that REPORT
and EXTRACT
can never occur on a UF.
To be complete, EXTRACT
happens first, then REPORT
, then automatic key-value extraction, then FIELDALIAS
, then EVAL
(in parallel), and finally LOOKUP
.
The rules of precedence (e.g. [source::]
overrides [sourcetype]
) matter only if the clauses have the same classes. That is to say, if they have identical keys. Thus, a [source::] SEDCMD-one
clause and a [sourcetype] SEDCMD-two
clause would both fire (in one
, two
order, because they're sorted using class
).