I am attempting to index just a few interesting events from an application's log files. These are unstructured text files. I do not want to index the entire log files, as those are at least 400MB per file. The events that I want to extract may not even add up to 4MB per day.
If I run a search with regex on the complete logs that were already indexed in a test run, I get just the required events.
So this works..
index=someindex sourcetype=somesourcetype
| regex _raw="my_regex_to_look_for_specific_text"
But when I add the same regex as a whitelist for future events, it does not index any new logs at all. If I take off the whitelist, the logs come in.
[monitor://E:\Program Files\some app\Logs\...\servername_LOGTYPE_*.txt]
disabled=0
index=someindex
sourcetype=somesourcetype
renderXml=false
whitelist1 = _raw = "my_regex_to_look_for_specific_text"
The documentation seems to covers lot on whitelisting file names, and not content within the files. https://docs.splunk.com/Documentation/Splunk/8.0.4/Data/Whitelistorblacklistspecificincomingdata
The only piece relevant to what I'm attempting to do is an example to blacklist the EventCode field with the value 4622.
[WinEventLog:Security]
blacklist1 = EventCode = "4662" Message = "Account Name:\s+(example account)"
The only difference I can see is that my logs are unstructured and do not have fields parsed by splunk. So that leaves me with _raw as a field for my whitelist.
Is there a way to do the whitelisting of specific content in the _raw field? Or any other way?
Hi @geoffmoraes ,
if you see in the above link there are two Use cases:
probably your is the second one ( https://docs.splunk.com/Documentation/Splunk/8.0.4/Forwarding/Routeandfilterdatad#Keep_specific_even... 😞
In props.conf
[your_sourcetype]
TRANSFORMS-set= setnull,setparsing
(beware to the order of commands in TRANSFORMS-set!)
In transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[setparsing]
REGEX = keyword1|keyword2|keyword3
DEST_KEY = queue
FORMAT = indexQueue
(order isn't important!)
Ciao.
Giuseppe
You can only exclude files and directories within the monitor stanza on the UF. The WinEventLog example is for a pre-configured format that Splunk understands. This is why you are able to be more granular in the filtering.
Filtering can be done on the indexer.
Hi @geoffmoraes ,
whitelist parameter is related to the name of the files to ingest, not to the events (https://docs.splunk.com/Documentation/Splunk/8.0.4/Admin/Inputsconf).
It isn't possible to filter events at Forwarder level with the only exception of wineventlogs.
So if you want to filter data, you have to do this on Indexers or (when present) on Heavy Forwarders.
To do this, follow the instructions at https://docs.splunk.com/Documentation/Splunk/8.0.4/Forwarding/Routeandfilterdatad#Filter_event_data_... .
In few words, you have to find the correct regex (and you did it)
then put on Indexers (or when present on Heavy Forwarders) in props.conf:
[your_sourcetype]
TRANSFORMS-null= setnull
in transforms.conf:
[setnull]
REGEX = my_regex_to_look_for_specific_text
DEST_KEY = queue
FORMAT = nullQueue
Then restart Splunk
Ciao.
Giuseppe
This filtering is being done on a heavy forwarder. I haven't tried your solution out yet, but have used the transforms.conf to send events to null.
I would like to whitelist specific keywords so only those events are indexed. If I'm not mistaken, sending to null would be blacklisting that event. Wouldn't this do the opposite of what I want?
Hi @geoffmoraes ,
if you see in the above link there are two Use cases:
probably your is the second one ( https://docs.splunk.com/Documentation/Splunk/8.0.4/Forwarding/Routeandfilterdatad#Keep_specific_even... 😞
In props.conf
[your_sourcetype]
TRANSFORMS-set= setnull,setparsing
(beware to the order of commands in TRANSFORMS-set!)
In transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[setparsing]
REGEX = keyword1|keyword2|keyword3
DEST_KEY = queue
FORMAT = indexQueue
(order isn't important!)
Ciao.
Giuseppe
Hi @gcusello,
So, in props I now have..
[source::some:sourcetype1]
TRANSFORMS-set= setnull,setparsing
and in transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[setparsing]
REGEX = (?i)(\bkeyword1\b).*(\bkeyword2\b.*\])(?i)
DEST_KEY = queue
FORMAT = indexQueue
After saving these files on the HF, I've uninstalled and redeployed the app it via the Forwarder Management in the GUI.
So far it's not working, as I get all logs with no filtering.
This same regex on previously indexed events works on a search query, returning just the required events.
index=someindex sourcetype=some:sourcetype1 | regex _raw="(?i)(\bkeyword1\b).*(\bkeyword2\b.*\])(?i)"
Am I missing something?
Hi @geoffmoraes ,
what's "source::some:sourcetype1" in the props.conf stanza?
in this stanza name, you have to put the sourcetype of the logs to filter (e.g.: [wineventlog]).
Ciao.
Giuseppe
Hi @gcusello
I took that from the example on the docs link which had source:: My mistake.
The actual sourcetype name has a : in it. I changed the props.conf to have [some:sourcetype1] and still no luck. Not sure what's wrong this time.
Assuming that I eventually get this to work, can two sourcetypes be used in the props.conf like this?
props.conf
[some:sourcetype1]
TRANSFORMS-set= setnull,setparsing1
[some:sourcetype2]
TRANSFORMS-set= setnull,setparsing2
transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[setparsing1]
REGEX = REGEX1
DEST_KEY = queue
FORMAT = indexQueue
[setparsing2]
REGEX = REGEX2
DEST_KEY = queue
FORMAT = indexQueue
Hi @geoffmoraes ,
let me understand: your sourcetype is called "some:sourcetype1" or it's only called "sourcetype1" and you inserted also "some:" in the stanza name?
If the first, try to change the name of the sourcetype avoiding to use ":" (use eventually "_") in "sourcetype1".
If the second, insert in the stanza name only "sourcetype1":
[sourcetype]
Ciao.
Giuseppe
Hi @gcusello
I've renamed the sourcetype, replacing the ":" with "_"
That too had no effect.
I'm not sure what now.
Without the props and transforms, all the logs come in. The regex works when run in the search query.
With the props and transforms, I get no logs.
Does the regex in the transforms.conf look right?
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[setparsing]
REGEX = (?i)(\bkeyword1\b).*(\bkeyword2\b.*\])(?i)
DEST_KEY = queue
FORMAT = indexQueue
Hi @geoffmoraes ,
you could have only three problems:
You can easily check the first problem watching the sourcetype in the search results.
For the second, you could use the regex command in a search.
For the third, these files must be on Indexers and/or (when present) on Heavy Forwarder; to me more sure put in both and, after updating, Splunk must be restarted on the updated Splunk System.
Ciao.
Giuseppe
Thanks @gcusello! I finally got it to work by taking off renderXml=false from the stanza. The logs then came in filtered as expected!
But it isn't over yet. I need to add another sourcetype (which contains XML) to this index with the same kind of filtering. All I could find relevant was this link below, but there isn't a clear solution.
Can setnull and setparsing be used for two different sourcetypes?
Hi @geoffmoraes ,
if the regex are the same you can use the same stanzas in transforms.conf, instead in props.conf you have to use two stanzas, one for each sourcetype.
If you have different regexes, you could create another stanza for the second setparsing (e.g. setparsing_xml) and use the same setnull.
Ciao.
Giuseppe
Thanks @gcusello !