Getting Data In

Question about override source value for a single input

jeffwarn
Explorer

I have a UDP input setup to handle syslog from a number of servers. On any one of these servers, there are multiple applications writing to syslog via log4j. We are trying to make it so that each application is easily identifiable, so I had them add a value to the log4j settings which will add [instance=x] after the timestamp. So one server could have dozens of "instances" associated with it. Example: app-server-01 , http-server-01 , java-server-01 ...etc.

While this works for searching, one of the developers seems to make use of the "Show Source" option, and has asked me if it would be possible to just show the raw log lines for the instance being searched for. I tried to explain that since we are using a single source input, that all the raw logs are going to write as they are received, that I don't think you can filter the source logs based on your search queries.

I guess my question to the community is, without making a separate UDP input for each instance grouping, would it be possible to rewrite the source key based on what the [instance] value is? I assume this would then make it so if the developer used the Show Source option, on an event, the source instead of being udp:5514 it would be app-server-01 and this would essentially make a raw log source of just lines tagged with this instance. The reason for this is, we could have hundreds of different instances, so maintaining all these different source inputs is not practical.

We were thinking about going the file input route, but we are trying to avoid maintaining NFS mounts on the Splunk server.

Tags (4)
1 Solution

sowings
Splunk Employee
Splunk Employee

Note: You could syslog to an intermediate system (sometimes called a "syslog aggregator") to write these files to disk. You could key off of the incoming IP address or whatever to write the contents to a separate file. Then you've got a backing store for the log data, instead of the potentially lossy UDP channel.

In any event, what you're asking for can be achieved by an index-time transform. The basic gist is "if the data matches this regex, set this (metadata) field to value 'x'".

If your sourcetype was called "syslog-log4j", you'd have a props.conf entry for the sourcetype, identifying the transform rule.

[syslog-log4j]
TRANSFORMS-1_source = force_source_from_instance

This references a rule that would be defined in transforms.conf:

[force_source_from_instance]
# This is the default, look in the "whole event", but we list it here explicitly
SOURCE_KEY =_raw
REGEX = (app-server-\d+)
DEST_KEY = MetaData:Source
FORMAT = source::$1

This would set the "source" field to "app-server-01" or whatever matching string was found in the event string. If you've got multiple instances to match in this way, you could tweak the regex to use an "OR" connector (the pipe character) to account for the multiple possible matches.

View solution in original post

Claw
Splunk Employee
Splunk Employee

To reinforce Sanford's response, we would recommend using a "syslog-aggregator" as "Best Practice" since it provides a reliable way to persist the incoming UDP or TCP data stream. Making your collection process more resilient.

0 Karma

jeffwarn
Explorer

Certainly understand this. We do currently write the raw logs to a NFS share, but we're trying to move away from this, because when the NFS mounts go stale, the applications tend to crash (since they can't write the logs).

I'm working to get them to put a heavy duty enterprise syslog server in place. I also wouldn't mind moving it to TCP for a bit of extra resilience. But one step at a time I suppose

0 Karma

sowings
Splunk Employee
Splunk Employee

Note: You could syslog to an intermediate system (sometimes called a "syslog aggregator") to write these files to disk. You could key off of the incoming IP address or whatever to write the contents to a separate file. Then you've got a backing store for the log data, instead of the potentially lossy UDP channel.

In any event, what you're asking for can be achieved by an index-time transform. The basic gist is "if the data matches this regex, set this (metadata) field to value 'x'".

If your sourcetype was called "syslog-log4j", you'd have a props.conf entry for the sourcetype, identifying the transform rule.

[syslog-log4j]
TRANSFORMS-1_source = force_source_from_instance

This references a rule that would be defined in transforms.conf:

[force_source_from_instance]
# This is the default, look in the "whole event", but we list it here explicitly
SOURCE_KEY =_raw
REGEX = (app-server-\d+)
DEST_KEY = MetaData:Source
FORMAT = source::$1

This would set the "source" field to "app-server-01" or whatever matching string was found in the event string. If you've got multiple instances to match in this way, you could tweak the regex to use an "OR" connector (the pipe character) to account for the multiple possible matches.

jeffwarn
Explorer

I think it's something the developer is doing on his system. I created a text file with a multiline event and fed it into the UDP input via netcat (nc) and it worked perfectly fine. I'd be more curious than anything to understand why it's behaving like it was, but once I put this into production I suppose I'll find out if it works as expected for sure.

0 Karma

jeffwarn
Explorer

still waiting for the devs to get back to me on seeing if multiline works with just using sourcetype stanza. In the meantime, this resolved the source key update, so thanks to you and Claw for the information!

0 Karma

sowings
Splunk Employee
Splunk Employee

Interesting. According to my reading of the "Splunk data pipeline" doc, the rules to apply the TRANSFORMS overriding source should happen after linebreaking was done. I wonder whether line breaking is happening after your source rename, and therefore it doesn't find the right set of rules.

If you're assigning a sourcetype (in your inputs.conf) to the UDP:55514 input stream, you could use that [sourcetype] stanza in your props.conf in place of [source::udp:55514].

0 Karma

jeffwarn
Explorer

Also to note, in my props.conf, the TRANSFORM line used to override the source is above the source:: entry I have listed above.

0 Karma

jeffwarn
Explorer

I wound up using:


[rewrite_source_instance]
DEST_KEY = MetaData:Source
REGEX = \d+\s\[instance=(.+)\]\s\[
FORMAT = source::$1

But I'm having a slight issue with multiline events now unfortunately. This might just be a setting on the developers log4j/syslog, so I'll have to investigate that.

My props.conf has:


[source::udp:55514]
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

It's writing separate events and not breaking at the date

0 Karma

Claw
Splunk Employee
Splunk Employee

This is a straight forward index time override. Place the following props.conf and Transforms.conf entries in your index instances and you will be golden.

Make sure you test the regex with your data as your exact [instance=my-instance-01] text my not match my regex.


props.conf

[udp:5514]

TRANSFORMS-udpsource=sourceoverride


transforms.conf

[sourceoverride]

DEST_KEY = MetaData:Source

REGEX = [instance=(.+)]

FORMAT = source::$1

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...