Trying to filter syslog input from ESX hosts, without impacting any other syslog inputs.
The ESX host names are 192.168.105.*
The only events we want to see from them will contain vmhba.
I have tried to follow the model at http://www.splunk.com/base/Documentation/4.1.6/Admin/Routeandfilterdata#Discard_specific_events_and_....
The regex works in a PCRE regex tester, but does not seem to be working in Splunk.
Is the regex wrong, or do I have a different problem?
props.conf:
[source::udp:514]
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TRANSFORMS-set=cisco_asa,nullQ=SyslogNullFilter
#TZ = GMT
transforms.conf:
[SyslogNullFilter]
REGEX = ^.{16}192\.168\.105\.(?!.*vmhba)
also tried:
REGEX = (?m)^.{16}192\.168\.105\.(?!.*vmhba)
REGEX = (?:^.{16}192\.168\.105\.(?!.*vmhba))
SOURCE_KEY=_raw
DEST_KEY=queue
FORMAT = nullQueue
Tried MW's suggestion "keep specific events while discarding the rest." It filtered all events from 192.168.105.*.
props.conf:
[source::udp:514]
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TRANSFORMS-set=setkeep,setnull,setcisco_asa
#TZ = GMT
transforms.conf:
[setcisco_asa]
DEST_KEY = MetaData:Sourcetype
REGEX = (%ASA)
FORMAT = sourcetype::cisco_asa
[setnull]
REGEX = .
SOURCE_KEY=_raw
DEST_KEY=queue
FORMAT = nullQueue
[setkeep]
REGEX = ^.{16}192\.168\.105\..*vmhba|^(?!.{16}192\.168\.105\.)
SOURCE_KEY=_raw
DEST_KEY=queue
FORMAT = indexQueue
You might want to use Fluentd to filter logs at the edge. Here's a full configuration that might work for you (it assumes that you are listening to syslog over UDP at port 5140.
# collecting syslog
<source>
@type syslog
port 5140
tag system
</source>
# adding hostname
<filter system.**>
@type record_transform
<record>
hostname "#{Socket.gethostname}"
</record>
# filtering based on the given condition
</filter>
<filter system.**>
@type grep
<regexp>
key hostname
pattern ^192.168.105.
</regexp>
<regexp>
key message
pattern vmhba
</regexp>
</filter>
This is just one example of the type of "smart filtering/routing" Fluentd can bring to the edge. For example, you can configure Fluentd so that Splunk only sees error/warn messages (to save on the bandwidth) like this:
<source>
@type syslog
port 5140
tag splunk
</source>
<match splunk..{error,warn}>
@type splunk
# other config parameters
</match>
<match splunk.*>
@type s3
# archive the rest in Amazon S3, say, for cheaper storage
</match>
Again, if you are looking to use Fluentd in a production environment, check out Fluentd Enterprise by Treasure Data
Was this ever solved?
Edited to reflect changes recommended by jrodman.
Regex works in tester, but not in Splunk.
First, I agree with jrodman that there are some syntax issues.
Also, it sounds like you actually want to keep specific events while discarding the rest, rather than discarding specific events and keeping everything else, as the link you provided leads me to. Do you maybe need this link? http://www.splunk.com/base/Documentation/4.1.6/Admin/Routeandfilterdata#Keep_specific_events_and_dis...
Ah, I'm following now.
If I were only concerned with syslog from the VM hosts you wold be correct. However, there are a lot of syslog inputs for switches, routers, etc. that I want to leave untouched. Obviously there is some sort of error in the regex. It seems that using the caret to anchor to the beginning of the event as Josh suggested should be the answer, but I can't get it to work outside of the tester.
I'm a bit distrustful of the capital S in Source::udp://514. Also, I thought events should have udp:514 as their source, without slashes. You should check in the search UI.
We may default to using _raw as the source key, but I've always set it explicitly. SOURCE_KEY=_raw in the transform.
I'm pretty baffled about the \n. Do your single syslog events actually have linebreaks in them? I would bet they don't.
TRANSFORMS-foo=a,b,c,d says to use all four of those. Then TRANSFORMS-bar=e says to use that one too.
OK, got it -- again -- there is not a direct relationship between inputs.conf and props.conf. Found a correctly formatted entry for udp:514 already in props, already using a transform. I added my transform to it. Are multiple transforms for the same source allowable? See above edit.
That's the documentation for inputs.conf. Props.conf syntax is source::the_source_string where your source string is udp:514. Maybe we should explicitly document how the source string is formatted for these events, but I prefer simply checking search results.
RE udp:..514 -- You are correct. In the search UI the source does not show slashes. However, the syntax is inline with that given in the manual: http://www.splunk.com/base/Documentation/4.1.6/Admin/Inputsconf
That explains why the tester only gets the first occurrence if the partial IP is in the first line of the test block, but not why it doesn't work in Splunk.
btw, case of "source::udp://514" has been fixed
Okay, but splunk isn't applying the regex to the whole file, it's applying it to each event. In fact, from splunk's perspective there isn't a file at all, because the data is arriving as udp packets.
When working in the tester (^.{16}) returns only the first 16 characters of the entire test block -- not the first 16 of each line.
^ did not work in the tester, but I tried it in Splunk anyway.
Did not work.
UDP syslog events aren't split by newlines, but by packets, so there's no newline even on the wire. In splunk, we cut on newlines in the LINE_BREAKER, so they're typically still not available except for multiline events. Use the ^ character for the beginning of your event.
I'm just using \n to anchor to the beginning of the event -- I only want the IP address if it begins at position 17.
I don't understand the comment about source key.