Getting Data In

UF inputs.conf blacklist rege based not working

SplunkExplorer
Contributor

Hi Splunkers, I have a problem with a blacklist filter.

On customer's UF, we filtered out some events changing the inputs.conf file.
The ones based on comma separated list, like Windows EventID, are working fine with no problem, while the one based on regex not.

Of course, as first thing, I checked regex syntax and I can confirm it works fine; testing it on regex101, it match perfectly what I want. Tests have been with different source logs, to be sure of a full proper working.
This is how we placed regex on UF:

[<stanza name>]

...other parameter...


blacklist = \]\sA\s+(.*)(microsoft|office|azure|o365|onenote|outlook|windowsupdate)(\(\d+\))(com|net|us)(\(\d+\))\s

This filter must be applied to logs coming by Windows DNS; its purpose is to avoid ingestion of legit domain, in all their combination, but only if they have a "normal" form. In regex you can see I put a filter about (<number>), because in raw log we have domains in format main_domain(<number>)root_domain, like microsoft(3)net.
For example, microsoft(2)com and microsoft(3)net match the regex and should be filtered out, while microsoft(9)123(5)com not and should be sent to Splunk.

My assumption is that I missed out some delimiter after the equals symbol; I mean, should I put regex code between any kind of symbols? Something like 

regex = '<regex code'> 

Or

regex = "<regex code>"

etcetera.

0 Karma
1 Solution

SplunkExplorer
Contributor

Hi Splunkers, in the end I worked with Support and it figured out the reason of not working regex: when applied to a monitor input on a UF, like that case, the blacklist parameter is applied to file name and/or path; our purpose is to filter out based on file payload and, to achieve this, we must work on HF, changing inputs.conf or both props and transforms.conf.

View solution in original post

0 Karma

SplunkExplorer
Contributor

Hi Splunkers, in the end I worked with Support and it figured out the reason of not working regex: when applied to a monitor input on a UF, like that case, the blacklist parameter is applied to file name and/or path; our purpose is to filter out based on file payload and, to achieve this, we must work on HF, changing inputs.conf or both props and transforms.conf.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Wait a minute. Are you ingesting those logs from files??? Because then it would make sense indeed. I suppose everyone involved assumed that as we're talking about ingesting windows events we were talking about eventlog input, not monitor one.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkExplorer ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Do you ingest events as "old format" or XML?

With XML events you have to do it differently.

https://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf

  * $XmlRegex: Use this key for filtering when you render Windows Event
    log events in XML by setting the 'renderXml' setting to "true". Search
    the online documentation for "Filter data in XML format with the
    XmlRegex key" for details.
0 Karma

SplunkExplorer
Contributor

Old format, no XML

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Then you need to read this:

* key=regex format:
  * A whitespace-separated list of Event Log components to match, and
    regular expressions to match against against them.
  * There can be one match expression or multiple expressions per line.
  * The key must belong to the set of valid keys provided in the "Valid
    keys for the key=regex format" section.
  * The regex consists of a leading delimiter, the regex expression, and a
    trailing delimiter. Examples: %regex%, *regex*, "regex"
  * When multiple match expressions are present, they are treated as a
    logical AND.  In other words, all expressions must match for the line to
    apply to the event.
  * If the value represented by the key does not exist, it is not considered
    a match, regardless of the regex.
  * Example:
    whitelist = EventCode=%^200$% User=%jrodman%
    Include events only if they have EventCode 200 and relate to User jrodman

# Valid keys for the key=regex format:

* The following keys are equivalent to the fields that appear in the text of
  the acquired events:
  * Category, CategoryString, ComputerName, EventCode, EventType, Keywords,
    LogName, Message, OpCode, RecordNumber, Sid, SidType, SourceName,
    TaskCategory, Type, User
* There are three special keys that do not appear literally in the event.
  * $TimeGenerated: The time that the computer generated the event
  * $Timestamp: The time that the event was received and recorded by the
                Event Log service.

What's important is that you specify which field the regex is to be applied to and that it needs to be enclosed in delimiters.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkExplorer,

could you share a sample of your logs (some to filter and some to not filter)?

Anyway, after the equal you don't need quotes or other.

Ciao.

Giuseppe

 

0 Karma

SplunkExplorer
Contributor

Hi Giuseppe,

below the link to regex101 with a used regex and a log that match it:

Matching regex 

Here same things but with a little change to log that made it not matching the regex, like expected:

Not matching regex 

Another idea is my use of capturing groups; should I use them in another way?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkExplorer,

there's a difference in the two logs that you have to manage:

in the not matching log there's "123(3)" between microsoft and com.

Please try this regex:

\]\sA\s+(.*)(microsoft|office|azure|o365|onenote|outlook|windowsupdate)(\(\d+\))(\d+\(\d+\))*(com|net|us)(\(\d+\))\s

that you can test at https://regex101.com/r/9mZoCU/3

Ciao.

Giuseppe

0 Karma

SplunkExplorer
Contributor

I fear I explained myself in a bad way Giuseppe, sorry.

Our purpose is to filter out only the case when domain you can find in parenthesis have a "proper" form.
For example:

microsoft.com
azure.net
office.us

Those domain for us are admitted ones, so we don't need to see them on SIEM and we want avoid that UF send logs with them on SPlunk Cloud.

On the other side, if the domain is "strange", like:

microsoft.123.com
azure-pirate.com
office.tryhackme.us

we want to be alerted and so, in this scenario, logs must be sent to Splunk.

Now, based on this, if you see Regex I shared with you, the normal behavior may be:

Case 1: regex matched -> Logs NOT send to Splunk
Case 2: regex NOT matched -> Logs MUST be sent on Splunk.

So, what the problem?

The logs for case 1 has been sent to Splunk, even if it match the regex and so it should be discarded.
In other words: the log of Regex Matched contain "microsoft.com", match the regex, should be discarded but it has been sent anyway to Splunk.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkExplorer ,

in this case for the intermediate part that I added, you should try "+" instead "*" that meanse that if this part isn't present the url must not be matched:

\]\sA\s+(.*)microsoft(\(\d+\))(\d+\(\d+\))+(com|net|us)(\(\d+\))\s

as you can test at https://regex101.com/r/9mZoCU/4

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...