Getting Data In

Including specific incoming data from monitored log files

geoffmoraes
Path Finder

I am attempting to index just a few interesting events from an application's log files. These are unstructured text files. I do not want to index the entire log files, as those are at least 400MB per file. The events that I want to extract may not even add up to 4MB per day.

If I run a search with regex on the complete logs that were already indexed in a test run, I get just the required events.

So this works..

index=someindex sourcetype=somesourcetype 
| regex _raw="my_regex_to_look_for_specific_text"

 

But when I add the same regex as a whitelist for future events, it does not index any new logs at all. If I take off the whitelist, the logs come in.

[monitor://E:\Program Files\some app\Logs\...\servername_LOGTYPE_*.txt]
disabled=0
index=someindex
sourcetype=somesourcetype
renderXml=false
whitelist1 = _raw = "my_regex_to_look_for_specific_text"

 

The documentation seems to covers lot on whitelisting file names, and not content within the files. https://docs.splunk.com/Documentation/Splunk/8.0.4/Data/Whitelistorblacklistspecificincomingdata

The only piece relevant to what I'm attempting to do is an example to blacklist the EventCode field with the value 4622.

[WinEventLog:Security]
blacklist1 = EventCode = "4662" Message = "Account Name:\s+(example account)"

 

The only difference I can see is that my logs are unstructured and do not have fields parsed by splunk. So that leaves me with _raw as a field for my whitelist.

Is there a way to do the whitelisting of specific content in the _raw field? Or any other way?

Labels (4)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @geoffmoraes ,

if you see in the above link there are two Use cases:

  • Discard specific events and keep the rest,
  • Keep specific events and discard the rest.

probably your is the second one ( https://docs.splunk.com/Documentation/Splunk/8.0.4/Forwarding/Routeandfilterdatad#Keep_specific_even... 😞

In props.conf

[your_sourcetype]
TRANSFORMS-set= setnull,setparsing

(beware to the order of commands in TRANSFORMS-set!)

In transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = keyword1|keyword2|keyword3
DEST_KEY = queue
FORMAT = indexQueue

(order isn't important!)

Ciao.

Giuseppe

View solution in original post

0 Karma

boz_8058
Explorer

You can only exclude files and directories within the monitor stanza on the UF. The WinEventLog example is for a pre-configured format that Splunk understands. This is why you are able to be more granular in the filtering.

Filtering can be done on the indexer.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @geoffmoraes ,

whitelist parameter is related to the name of the files to ingest, not to the events (https://docs.splunk.com/Documentation/Splunk/8.0.4/Admin/Inputsconf).

It isn't possible to filter events at Forwarder level with the only exception of wineventlogs.

So if you want to filter data, you have to do this on Indexers or (when present) on Heavy Forwarders.

To do this, follow the instructions  at https://docs.splunk.com/Documentation/Splunk/8.0.4/Forwarding/Routeandfilterdatad#Filter_event_data_... .

In few words, you have to find the correct regex (and you did it)

then put on Indexers (or when present on Heavy Forwarders) in props.conf:

[your_sourcetype]
TRANSFORMS-null= setnull

in transforms.conf:

[setnull]
REGEX = my_regex_to_look_for_specific_text
DEST_KEY = queue
FORMAT = nullQueue

Then restart Splunk

Ciao.

Giuseppe

0 Karma

geoffmoraes
Path Finder

This filtering is being done on a heavy forwarder. I haven't tried your solution out yet, but have used the transforms.conf to send events  to null.

I would like to whitelist specific keywords so only those events are indexed. If I'm not mistaken, sending to null would be blacklisting that event. Wouldn't this do the opposite of what I want?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @geoffmoraes ,

if you see in the above link there are two Use cases:

  • Discard specific events and keep the rest,
  • Keep specific events and discard the rest.

probably your is the second one ( https://docs.splunk.com/Documentation/Splunk/8.0.4/Forwarding/Routeandfilterdatad#Keep_specific_even... 😞

In props.conf

[your_sourcetype]
TRANSFORMS-set= setnull,setparsing

(beware to the order of commands in TRANSFORMS-set!)

In transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = keyword1|keyword2|keyword3
DEST_KEY = queue
FORMAT = indexQueue

(order isn't important!)

Ciao.

Giuseppe

0 Karma

geoffmoraes
Path Finder

Hi @gcusello,

So, in props I now have..

 

[source::some:sourcetype1]
TRANSFORMS-set= setnull,setparsing

 

 

and in transforms.conf

 

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = (?i)(\bkeyword1\b).*(\bkeyword2\b.*\])(?i)
DEST_KEY = queue
FORMAT = indexQueue

 

 

After saving these files on the HF, I've uninstalled and redeployed the app it via the Forwarder Management in the GUI.

So far it's not working, as I get all logs with no filtering.

This same regex on previously indexed events works on a search query, returning just the required events.

 

index=someindex sourcetype=some:sourcetype1 | regex _raw="(?i)(\bkeyword1\b).*(\bkeyword2\b.*\])(?i)"

 

 

Am I missing something?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @geoffmoraes ,

what's "source::some:sourcetype1" in the props.conf stanza?

in this stanza name, you have to put the sourcetype of the logs to filter (e.g.: [wineventlog]).

Ciao.

Giuseppe

0 Karma

geoffmoraes
Path Finder

Hi @gcusello 

I took that from the example on the docs link which had source::  My mistake.

The actual sourcetype name has a : in it. I changed the props.conf to have [some:sourcetype1] and still no luck. Not sure what's wrong this time.

Assuming that I eventually get this to work, can two sourcetypes be used in the props.conf like this? 

props.conf

[some:sourcetype1]
TRANSFORMS-set= setnull,setparsing1

[some:sourcetype2]
TRANSFORMS-set= setnull,setparsing2

 

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing1]
REGEX = REGEX1
DEST_KEY = queue
FORMAT = indexQueue

[setparsing2]
REGEX = REGEX2
DEST_KEY = queue
FORMAT = indexQueue

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @geoffmoraes ,

let me understand: your sourcetype is called "some:sourcetype1" or it's only called "sourcetype1" and you inserted also "some:" in the stanza name?

If the first, try to change the name of the sourcetype avoiding to use ":" (use eventually "_") in "sourcetype1".

If the second, insert in the stanza name only "sourcetype1":

[sourcetype]

Ciao.

Giuseppe

0 Karma

geoffmoraes
Path Finder

Hi @gcusello 

I've renamed the sourcetype, replacing the ":" with "_"

That too had no effect.

I'm not sure what now.

Without the props and transforms, all the logs come in. The regex works when run in the search query.

With the props and transforms, I get no logs.

Does the regex in the transforms.conf look right?

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = (?i)(\bkeyword1\b).*(\bkeyword2\b.*\])(?i)
DEST_KEY = queue
FORMAT = indexQueue

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @geoffmoraes ,

you could have only three problems:

  • the sourcetype of these logs isn't correct or is different from the one in props.conf;
  • the regex isn't correct;
  • the location of props and transforms files isn't correct.

You can easily check the first problem watching the sourcetype in the search results.

For the second, you could  use the regex command in a search.

For the third, these files must be on Indexers and/or (when present) on Heavy Forwarder; to me more sure put in both and, after updating, Splunk must be restarted on the updated Splunk System.

Ciao.

Giuseppe

0 Karma

geoffmoraes
Path Finder

Thanks @gcusello!  I finally got it to work by taking off renderXml=false from the stanza. The logs then came in filtered as expected!

But it isn't over yet. I need to add another sourcetype (which contains XML) to this index with the same kind of filtering. All I could find  relevant was this link below, but there isn't a clear solution.

Can setnull and setparsing be used for two different sourcetypes?

https://community.splunk.com/t5/Getting-Data-In/Using-setnull-and-setparsing-for-two-different-sourc...

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @geoffmoraes ,

if the regex are the same you can use the same stanzas in transforms.conf, instead in props.conf you have to use two stanzas, one for each sourcetype.

If you have different regexes, you could create another stanza for the second setparsing (e.g. setparsing_xml) and use the same setnull.

Ciao.

Giuseppe

geoffmoraes
Path Finder

Thanks @gcusello !

0 Karma
Get Updates on the Splunk Community!

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...