Getting Data In

How to filter events from a file BEFORE it gets to Splunk index

itsomana
Path Finder

Hi

I would like to find out how I can "strip out" events from a input file before they reach the splunk indexer. I dont not wish to send the entire file to the indexer just certain events on the file.

Requirement

want to drop /ignore /filter events before they get sent to the splunk indexer .
for example I have a text "/tmp/gerry.txt"

Want to just send the lines with "example1" on this file to the indexer . ie do not want to have to send the entire file as the file in real production will be quiet large

Information

Am using the universal forwarder and am on 4.2.1

text file

Aim is to just extract the "example1" records from the file and send these on to the indexer

Read a document detailing how to do this with props.conf and transforms.conf

[root@server1 dbs]# more /tmp/gerry.txt
example1 text
example1 text2
example1 text3
example1 text4
example1 text5
example2 text
example2 text2
example2 text3
example2 text3
example3 text1
example3 text2
example3 text3
example4 text4
example4 text1
example4 test2
example5 test3
example4 test3
example4 test443
example4 test3444


new line
another new line

inputs.conf

I am using the file in /opt/splunk/etc/system/local/inputs.conf

so here is my input.conf file

[monitor:///tmp/gerry.txt]
sourcetype=testger

props.conf

I am using the file here /opt/splunk/etc/system/local/props.conf

[source:://tmp/gerry.txt]
TRANSFORMS-set= setnull,setparsing

Then I'm using the following in the transforms.conf file
transforms.conf file in /opt/splunk/etc/system/local/transforms.conf

[setnull]
REGEX= .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = example1
DEST_KEY = queue
FORMAT = indexQueue

So this should strip out everything (except lines with example1) from the /tmp/gerry.txt file and it should just fordward entries to the indexer

However - it doesn't work .

other information

[root@server1 dbs]# netstat -tulp | grep splunk
tcp 0 0 :8089 *: LISTEN 15623/splunkd
getnameinfo failed
[root@ff-osrv-03 dbs]#

I need a documented method of doing this both for the full client and for the splunk fordwarder .

Tags (1)
1 Solution

_d_
Splunk Employee
Splunk Employee

Itsomana, you'd need a full Splunk installed as a HeavyForwarder on your server to perform parsing and nullQueuing as per your requirements. A Universal Forwarder won't do any parsing and leaves this job to the indexer.

So, you have two options:

  1. install Splunk HeavyForwader at the data source server
  2. Use your props and transforms at the indexer instead

- please upvote if you find this answer useful

View solution in original post

_d_
Splunk Employee
Splunk Employee

Itsomana, you'd need a full Splunk installed as a HeavyForwarder on your server to perform parsing and nullQueuing as per your requirements. A Universal Forwarder won't do any parsing and leaves this job to the indexer.

So, you have two options:

  1. install Splunk HeavyForwader at the data source server
  2. Use your props and transforms at the indexer instead

- please upvote if you find this answer useful

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

With the universal forwarder this is not possible. The way that Splunk works, events can only be filtered through nullQueue after they have been parsed. The parsing of events from a file is what breaks character sequences into lines, and lines into events, and locates timestamps, etc.

The universal forwarder does not do event parsing - it defers that work until the data gets to the indexer. As far as the universal forwarder is concerned, a file is a sequence of bytes and not a sequence of events. Once the data gets to the indexer, the sequence of bytes is then parsed into events and at that point you can filter out specific events via the nullQueue technique you are using above.

A lightweight forwarder does not parse events either. The two Splunk configurations that do parse events are an indexer and a heavy (full) forwarder. If you must parse events at the forwarder, then you must deploy the heavy forwarder to do so.

dhergert_payx
Engager

Does data getting to the Indexer and parsed to only be filtered out count against your license? I am trying to exclude redundant, verbose noise from the events being processed because they are useless and repetitive and needlessly eat up license bandwidth.

pwilliams_splun
Splunk Employee
Splunk Employee

No. You are filtering out the events before indexing by routing to the NullQueue. It is hitting your indexer, but is not being indexed. Just because data got to your indexer for processing does not necessarily mean that the data got "indexed". This allows you to trim your ingest. In most cases, you would want the full fidelity of your logs, but in some cases it is necessary or prudent to trim some waste. Just be careful that you dont exclude something that you later might find interesting. I did this once with some security relevant data that did not seem relevant when I filtered it out. Made a problem for me down the road. Good luck!

0 Karma

Ayn
Legend

Yes, filtering can be done on the indexer.

Dark_Ichigo
Builder

But if you forward the logs using a LightForwarder and do the Filtering using NullQueue on the Splunk Indexer side, that should be possible right?

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.