Getting Data In

Why is my transforms.conf configuration with multiple transforms not assigning different sourcetypes correctly?

Builder

I have a data source where I'm applying multiple transforms (because there are multiple possible formats for the log lines in the data source...sucky, but it's what I have to work with).

Here's what I've set in props.conf:

[error-test]
BREAK_ONLY_BEFORE=^
MAX_TIMESTAMP_LOOKAHEAD=150
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=[%A %B %d %T %Y]
REPORT-error_logs=assignZkError, assignIPSPHP, assignSTD, assignBadScript, assignErrorOnly, assignRestartDigest, assignRestartStatus, assignDud

This is working fantastically for extracting the fields defined in the regex values in each of those transforms stanzas.

However, I want each transform stanza to assign a different sourcetype. That part is not working. When I use these settings, I get the stanza name from props.conf (i.e., "error-test"), not the sourcetype value specified in the transforms.conf.

Here's my transforms.conf:

[assignDud]
REGEX = (?<error_message>.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::dud

[assignRestartStatus]
REGEX = \[(?<error_time>\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\] \[(?<error_type>\w+)\] (?<error_message>.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::restart

[assignRestartDigest]
REGEX = \[(?<error_time>\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\] \[(?<error_type>\w+)\] (?<error_class>[\w\s\d]+): (?<error_message>.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::restart

[assignErrorOnly]
REGEX = \[(?<error_time>\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\] \[(?<error_type>\w+)\] \[client (?<error_client>\d+\.\d+\.\d+\.\d+)\] (?<error_message>.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::error_only

[assignBadScript]
REGEX = \[(?<error_time>\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\] \[(?<error_type>\w+)\] \[client (?<error_client>\d+\.\d+\.\d+\.\d+)\] (?<error_message>.*)(, referer: (?<error_referrer>.*))
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::bad_script

[assignSTD]
REGEX = \[(?<error_time>\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\] \[(?<error_type>\w+)\] \[client (?<error_client>\d+\.\d+\.\d+\.\d+)\] (|\(\d+\))(?<error_class>[\w\s\d]+): (?<error_message>.*)(, referer: (?<error_referrer>.*)|\n)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::standard

[assignIPSPHP]
REGEX = \[(?<error_time>\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\] \[(?<error_type>\w+)\] \[client (?<error_client>\d+\.\d+\.\d+\.\d+)\] \[\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4}\] \[(?<error_class>IPS_PHP:\S+)\] \"(?<error_message>.*)\" URI: (?<error_uri>\S+) APACHE: (?<error_apache>\S+)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::IPS_PHP

[assignZkError]
REGEX = \[(?<error_time>\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\] \[(?<error_type>\w+)\] \[client (?<error_client>\d+\.\d+\.\d+\.\d+)\] \[\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4}\] \[(?<error_class>ZkError:\S+)\] \"(?<error_message>.*)\" URI: (?<error_uri>\S+) APACHE: (?<error_apache>\S+), referer: (?<error_referrer>.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::ZkError

What am I doing wrong?

EDIT: At present, I'm testing my configurations by importing a flat file once that contains examples of each of the types of errors that these transforms are being applied to, so this props.conf and transforms.conf exist on the Splunk server, not on the Universal Forwarder that will ultimately be sending the data.

0 Karma
1 Solution

Builder

After several more hours of additional playing around...here's what I ended up with....

To change sourcetype, you need to use TRANSFORMS in your props.conf stanza. To extract fields when the data is coming from a non-heavy forwarder (e.g., a universal forwarder), you need to use REPORT in your props.conf stanza (they have to be search-time extractions, not index-time extractions). In this scenario, at least, you can't do both in the same props.conf stanza (at least, when I tried it, whichever one came second in the props.conf stanza was the only one that got results, so either I got my altered sourcetype OR my extracted fields, never both).

So I left transforms.conf exactly the way it was (see original post above), set the generic sourcetype originally specified in my props.conf to use TRANSFORMS (which reassigned the sourcetype for the events), then added props.conf stanzas for each of the new sourcetypes that uses REPORT and points to the same relative transforms.conf stanzas (to minimize how many stanzas I needed to put in transforms.conf).

End result of props.conf:

[error-test]
BREAK_ONLY_BEFORE=^
MAX_TIMESTAMP_LOOKAHEAD=150
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=[%A %B %d %T %Y]
TRANSFORMS-error_logs = assignDud, assignRestartStatus, assignRestartDigest, assignErrorOnly, assignSTD, assignBadScript, assignIPSPHP, assignZkError

[dud]
REPORT-error_logs = assignDud

[restart]
REPORT-error_logs = assignRestartDigest, assignRestartStatus

[error_only]
REPORT-error_logs = assignErrorOnly

[bad_script]
REPORT-error_logs = assignBadScript

[standard]
REPORT-error_logs = assignSTD

[IPS_PHP]
REPORT-error_logs = assignIPSPHP

[ZkError]
REPORT-error_logs = assignZkError

View solution in original post

0 Karma

Builder

After several more hours of additional playing around...here's what I ended up with....

To change sourcetype, you need to use TRANSFORMS in your props.conf stanza. To extract fields when the data is coming from a non-heavy forwarder (e.g., a universal forwarder), you need to use REPORT in your props.conf stanza (they have to be search-time extractions, not index-time extractions). In this scenario, at least, you can't do both in the same props.conf stanza (at least, when I tried it, whichever one came second in the props.conf stanza was the only one that got results, so either I got my altered sourcetype OR my extracted fields, never both).

So I left transforms.conf exactly the way it was (see original post above), set the generic sourcetype originally specified in my props.conf to use TRANSFORMS (which reassigned the sourcetype for the events), then added props.conf stanzas for each of the new sourcetypes that uses REPORT and points to the same relative transforms.conf stanzas (to minimize how many stanzas I needed to put in transforms.conf).

End result of props.conf:

[error-test]
BREAK_ONLY_BEFORE=^
MAX_TIMESTAMP_LOOKAHEAD=150
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=[%A %B %d %T %Y]
TRANSFORMS-error_logs = assignDud, assignRestartStatus, assignRestartDigest, assignErrorOnly, assignSTD, assignBadScript, assignIPSPHP, assignZkError

[dud]
REPORT-error_logs = assignDud

[restart]
REPORT-error_logs = assignRestartDigest, assignRestartStatus

[error_only]
REPORT-error_logs = assignErrorOnly

[bad_script]
REPORT-error_logs = assignBadScript

[standard]
REPORT-error_logs = assignSTD

[IPS_PHP]
REPORT-error_logs = assignIPSPHP

[ZkError]
REPORT-error_logs = assignZkError

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

TRANSFORMS-blah = changesourcetype
TRANSFORMS-foo = get-fields
To do two things, you use two.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

Builder

As you reminded me, TRANSFORMS doesn't work for extracting the fields except when you can do index-time extractions. After some additional research, I determined that we need to do search-time extractions (using REPORT).

As I stated above, I tried doing both REPORT and TRANSFORMS in the same props.conf stanza. Whichever one came second in the stanza was the only one that was actually applied to the data.

If you're supposed to be able to do that, then either there's something missing in what I did or there's a bug.

0 Karma

Splunk Employee
Splunk Employee

In the spec for transforms.conf it discusses the prescience of the processing of stanzas... you may be running into something like that...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

Builder

Precedence states that the last line of a given type will take precedence over any previous lines of the same type, and that's exactly what happened when I tried to put both a TRANSFORMS line and a REPORT line in the "error-test" stanza.

0 Karma

Splunk Employee
Splunk Employee

TRANSFORMS-class=stanza is index time, which is required for sourcetype assignment
REPORT-class = stanza is for search time.

The transforms stanzas you have are assigning a value to sourcetype when the pattern matches... it's not also extracting the fields TO somewhere...

at index time you add the WRITE_META if you only want fields at search time, omit WRITE_META
Note the "FORMAT="

[extracindextfield]
REGEX =  device_id=\[w+\](?<err_code>[^:]+)
FORMAT = err_code::$1
WRITE_META = true

[extract-search-field]
REGEX = device_id=\[w+\](?<err_code>[^:]+)
FORMAT = err_code::$1

So you need to separate out the operations...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

Builder

I'm not having a problem extracting the fields I want when I use the REPORT-error_logs line in props.conf. That is working exactly the way I want it to.

The problem is with the assignment of the sourcetype.

Are you saying I need to add WRITE_META = true to the transforms.conf stanzas in order to force it to write those values?

0 Karma

Splunk Employee
Splunk Employee

think of it this way... assigning sourcetype is an index time only process... so when you use REPORT it can't assign the sourcetype... so it is doing what it can with your fields. when you call it with TRANSFORMS, the regex becomes something to match and it assigns the sourcetype. You can't really do both at once... WRITE_META is when you are creating fields out of thin air... but you want them to now be metadata fields like sourcetype, index, host etc...

Do two calls
REPORT
TRANSFORMS

call the extractions in REPORT and the sourcetype assignment with TRANSFORMS (which can have a much simpler regex because you are only matching the 'name' )

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

Splunk Employee
Splunk Employee

this is all in the spec examples:
http://docs.splunk.com/Documentation/Splunk/6.2.2/Admin/Transformsconf#transforms.conf.example

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

Builder

Okay, I'd forgotten that difference between TRANSFORMS and REPORT (I started working on this project like 2 months ago and had to back-burner it for most of that last 2 months, so I'd forgotten why I'd written it that way). What I really want should be both things coming from TRANSFORMS rather than REPORT - I want the fields extracted at index time (not search time) and I want to set the sourcetype based on which transforms stanza matches for the extraction.

Originally, I had written my props.conf stanza to use "TRANSFORMS-error_logs" because of this; when coming back to the project this morning (after 2 months of not looking at it), I switched to "REPORT-error_logs" because the fields were not being extracted. Possibly what I was missing in this case is the "FORMAT" line in the transforms stanza, but I can see where there could be some conflict there because I already have a "FORMAT" line for specifying the sourcetype.

So, if I change each of the existing transforms.conf stanzas to drop the DEST_KEY and current FORMAT lines, and add a new FORMAT line like this (for the assignZkError stanza):

FORMAT = error_time::$1, error_type::$2, error_client::$3, error_class::$4, error_message::$5, error_uri::$6, error_apache::$7, error_referrer::$8

That should cause the fields to be extracted at index time instead of search time, right?

But then, how do I get that to assign a sourcetype of "ZkError", given that each of those fields appear in multiple transforms?

0 Karma

Splunk Employee
Splunk Employee

well, if you want to be really clear... you can assign the sourcetype assignments to the source in props.conf i.e.

[...blah]
TRANSFORMS-force = assignZkError, assignIPSPHP... etc

    [ZkError]
    TRANSFORMS - getfieldsZk = getZkErrorFields

    [OPS_PHP]
    TRANSFORMS - getfieldsIP = getIPSPHPFields

'blah' being the name of your source.

That would kind of annotate it for you so when you went back you could see what you meant to be doing. 🙂

Be sure you really want those fields to be indexed field extractions... not sure what your use case is, but you want to consider that it is gonna take up more space.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

Builder

For the record, I just tried that FORMAT thing I outlined in my comment above...big ba-da-boom. Definitely didn't work right, so I'm missing something in how that's supposed to work.

What I got is a bunch of duplicated data inserted into each field. The fields were extracted at index time like I wanted, but instead of getting a single IP address in the error_client field, for example, I got a multi-value field with the IP address repeated 3 times... Same on most of the other fields.

So probably my "FORMAT" line is in the wrong format to work properly, but I don't see any examples in the documentation of how to specify multiple fields, so I was just guessing that a comma-delimited string would work.

0 Karma

Builder

Some results from additional playing around:

When I add this to the props.conf stanza, all of the events get the sourcetype of the LAST transforms.conf stanza specified (in this case, "dud"):

TRANSFORMS-error_logs=assignZkError, assignIPSPHP, assignSTD, assignBadScript, assignErrorOnly, assignRestartDigest, assignRestartStatus, assignDud

And all of the regexes in the transforms.conf stanzas get ignored (so no fields get extracted).

This is also what happens if I use "TRANSFORMS-error_logs" instead of "REPORT-error_logs".

0 Karma