Getting Data In

Filtering syslog inputs

rgcox1
Communicator

Trying to filter syslog input from ESX hosts, without impacting any other syslog inputs.
The ESX host names are 192.168.105.*
The only events we want to see from them will contain vmhba.
I have tried to follow the model at http://www.splunk.com/base/Documentation/4.1.6/Admin/Routeandfilterdata#Discard_specific_events_and_....
The regex works in a PCRE regex tester, but does not seem to be working in Splunk.
Is the regex wrong, or do I have a different problem?

props.conf:

[source::udp:514]
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TRANSFORMS-set=cisco_asa,nullQ=SyslogNullFilter 
#TZ = GMT

transforms.conf:

  [SyslogNullFilter]
  REGEX = ^.{16}192\.168\.105\.(?!.*vmhba)
    also tried:
    REGEX = (?m)^.{16}192\.168\.105\.(?!.*vmhba)
    REGEX = (?:^.{16}192\.168\.105\.(?!.*vmhba))
  SOURCE_KEY=_raw
  DEST_KEY=queue 
  FORMAT = nullQueue

Tried MW's suggestion "keep specific events while discarding the rest." It filtered all events from 192.168.105.*.

props.conf:

[source::udp:514]
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TRANSFORMS-set=setkeep,setnull,setcisco_asa
#TZ = GMT

transforms.conf:

[setcisco_asa]
DEST_KEY = MetaData:Sourcetype
REGEX = (%ASA)
FORMAT = sourcetype::cisco_asa

[setnull]
REGEX = .
SOURCE_KEY=_raw
DEST_KEY=queue 
FORMAT = nullQueue 

[setkeep]
REGEX = ^.{16}192\.168\.105\..*vmhba|^(?!.{16}192\.168\.105\.)
SOURCE_KEY=_raw
DEST_KEY=queue 
FORMAT = indexQueue
Tags (2)
0 Karma

kiyototamura
Explorer

You might want to use Fluentd to filter logs at the edge. Here's a full configuration that might work for you (it assumes that you are listening to syslog over UDP at port 5140.

# collecting syslog
<source>
@type syslog
port 5140
tag system
</source>

# adding hostname
<filter system.**>
@type record_transform
<record>
hostname "#{Socket.gethostname}"
</record>

# filtering based on the given condition
</filter>
<filter system.**>
@type grep
<regexp>
key hostname
pattern ^192.168.105.
</regexp>
<regexp>
key message
pattern vmhba
</regexp>
</filter>

This is just one example of the type of "smart filtering/routing" Fluentd can bring to the edge. For example, you can configure Fluentd so that Splunk only sees error/warn messages (to save on the bandwidth) like this:

<source>
@type syslog
port 5140
tag splunk
</source>
<match splunk..{error,warn}>
@type splunk
# other config parameters
</match>
<match splunk.
*>
@type s3
# archive the rest in Amazon S3, say, for cheaper storage
</match>

Again, if you are looking to use Fluentd in a production environment, check out Fluentd Enterprise by Treasure Data

0 Karma

grantsales
Engager

Was this ever solved?

0 Karma

rgcox1
Communicator

Edited to reflect changes recommended by jrodman.

Regex works in tester, but not in Splunk.

0 Karma

mw
Splunk Employee
Splunk Employee

First, I agree with jrodman that there are some syntax issues.

Also, it sounds like you actually want to keep specific events while discarding the rest, rather than discarding specific events and keeping everything else, as the link you provided leads me to. Do you maybe need this link? http://www.splunk.com/base/Documentation/4.1.6/Admin/Routeandfilterdata#Keep_specific_events_and_dis...

0 Karma

mw
Splunk Employee
Splunk Employee

Ah, I'm following now.

0 Karma

rgcox1
Communicator

If I were only concerned with syslog from the VM hosts you wold be correct. However, there are a lot of syslog inputs for switches, routers, etc. that I want to leave untouched. Obviously there is some sort of error in the regex. It seems that using the caret to anchor to the beginning of the event as Josh suggested should be the answer, but I can't get it to work outside of the tester.

0 Karma

jrodman
Splunk Employee
Splunk Employee

I'm a bit distrustful of the capital S in Source::udp://514. Also, I thought events should have udp:514 as their source, without slashes. You should check in the search UI.

We may default to using _raw as the source key, but I've always set it explicitly. SOURCE_KEY=_raw in the transform.

I'm pretty baffled about the \n. Do your single syslog events actually have linebreaks in them? I would bet they don't.

0 Karma

jrodman
Splunk Employee
Splunk Employee

TRANSFORMS-foo=a,b,c,d says to use all four of those. Then TRANSFORMS-bar=e says to use that one too.

0 Karma

rgcox1
Communicator

OK, got it -- again -- there is not a direct relationship between inputs.conf and props.conf. Found a correctly formatted entry for udp:514 already in props, already using a transform. I added my transform to it. Are multiple transforms for the same source allowable? See above edit.

0 Karma

jrodman
Splunk Employee
Splunk Employee

That's the documentation for inputs.conf. Props.conf syntax is source::the_source_string where your source string is udp:514. Maybe we should explicitly document how the source string is formatted for these events, but I prefer simply checking search results.

0 Karma

rgcox1
Communicator

RE udp:..514 -- You are correct. In the search UI the source does not show slashes. However, the syntax is inline with that given in the manual: http://www.splunk.com/base/Documentation/4.1.6/Admin/Inputsconf

0 Karma

rgcox1
Communicator

That explains why the tester only gets the first occurrence if the partial IP is in the first line of the test block, but not why it doesn't work in Splunk.

btw, case of "source::udp://514" has been fixed

0 Karma

jrodman
Splunk Employee
Splunk Employee

Okay, but splunk isn't applying the regex to the whole file, it's applying it to each event. In fact, from splunk's perspective there isn't a file at all, because the data is arriving as udp packets.

0 Karma

rgcox1
Communicator

When working in the tester (^.{16}) returns only the first 16 characters of the entire test block -- not the first 16 of each line.

0 Karma

rgcox1
Communicator

^ did not work in the tester, but I tried it in Splunk anyway.

Did not work.

0 Karma

jrodman
Splunk Employee
Splunk Employee

UDP syslog events aren't split by newlines, but by packets, so there's no newline even on the wire. In splunk, we cut on newlines in the LINE_BREAKER, so they're typically still not available except for multiline events. Use the ^ character for the beginning of your event.

0 Karma

rgcox1
Communicator

I'm just using \n to anchor to the beginning of the event -- I only want the IP address if it begins at position 17.

I don't understand the comment about source key.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...