Splunk Search

How to filter out events in Splunk 6.3.1 at index-time, except files containing the string "#!" in the first 5 characters of the file?

stanvv
New Member

Hi,

I only want to index files containing the string #! in the first 5 characters of the file.
Therefore, I created the following inputs.conf:

[monitor:pathname] 
blacklist = (?i:archive|develop|data|backup|\.txt$|\.gz$|\.tar$|\.csv$|\.bck$|\.log$|\.old$|\d{6,})
disabled = false 
host = script 
index = abcindex 
sourcetype = abcscript

Props.conf:

[abcscript] 
TRANSFORMS-set= setnull,setparsing

Transforms.conf:

[setnull] 
REGEX = . 
DEST_KEY = queue
FORMAT = nullQueue

[setparsing] 
REGEX = (.{0,5}(#!))
DEST_KEY = queue
FORMAT = indexQueue

Based on http://docs.splunk.com/Documentation/Splunk/6.3.1/Forwarding/Routeandfilterdatad
Unfortunately, everything is indexed in the index "abcindex" at the moment, and not only files starting with #!
I also tried it with a dummy string in a dummy file, but again, everything is indexed.
Rebooted Splunk after changing config files.

Any idea what goes wrong here?
Using Splunk 6.3.1 at the moment.

Thanks

0 Karma

tmarlette
Motivator

Out of curiosity are you trying to do all of this on a universal forwarder?

If you are, adding these props/transforms to a UF they won't work, you have to add those settings to your indexing tier.

0 Karma

stanvv
New Member

I'm testing it on a local Splunk enterprise at the moment.

0 Karma

woodcock
Esteemed Legend

Your RegEx is wrong; try this:

REGEX = ^(.{0,3}(#!))

This needs to be deployed to all your indexers and the splunk instances running there need to be restarted. After this is done, incoming events will be properly filtered but events indexed before the restart will not be effected.

stanvv
New Member

Thanks for you answer. I tried the above (changed regex, rebooted and tried it with dummy files: one starting with #! and the other didn't) but still files not starting with #! were indexed.
I'm testing it on a local Splunk enterprise at the moment.

0 Karma

woodcock
Esteemed Legend

The RegEx applies to each event, not to the entire file.

0 Karma

stanvv
New Member

The files I'm monitoring are scripts (sometimes with an undefined filetype). So if the file content itself starts with #! I want it to be indexed. If it doesn't, it should go to the nullQueue.

Example
File 1 (needs to be indexed)

#!
########
# Intro 123
########
#Scriptinfo
ABC = 123

File 2 (send to nullQueue)

########
# Intro 234
########
#Scriptinfo
DEF = 567

Do props.conf and transforms.conf also work for non log/txt files? Any ideas what's the best solution?

0 Karma

woodcock
Esteemed Legend

Make sure you set this for your sourcetype in props.conf:

[YourSourcetypeHere]
LINE_BREAKER=(\Z)
TRUNCATE=500000
SHOULD_LINEMERGE = 1

This will treat the entire file as a single event and then it should work as you expect. Deploy this to the Indexers (or Heavy Forwarders) and restart all splunk instances there. This will apply ONLY TO FUTURE EVENTS (the scripts that are already there have already been processed) so you will have to create new files in order to test this.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...