Hi Splunkers, I have a problem with a blacklist filter.
On customer's UF, we filtered out some events changing the inputs.conf file.
The ones based on comma separated list, like Windows EventID, are working fine with no problem, while the one based on regex not.
Of course, as first thing, I checked regex syntax and I can confirm it works fine; testing it on regex101, it match perfectly what I want. Tests have been with different source logs, to be sure of a full proper working.
This is how we placed regex on UF:
[<stanza name>]
...other parameter...
blacklist = \]\sA\s+(.*)(microsoft|office|azure|o365|onenote|outlook|windowsupdate)(\(\d+\))(com|net|us)(\(\d+\))\s
This filter must be applied to logs coming by Windows DNS; its purpose is to avoid ingestion of legit domain, in all their combination, but only if they have a "normal" form. In regex you can see I put a filter about (<number>), because in raw log we have domains in format main_domain(<number>)root_domain, like microsoft(3)net.
For example, microsoft(2)com and microsoft(3)net match the regex and should be filtered out, while microsoft(9)123(5)com not and should be sent to Splunk.
My assumption is that I missed out some delimiter after the equals symbol; I mean, should I put regex code between any kind of symbols? Something like
regex = '<regex code'>
Or
regex = "<regex code>"
etcetera.
Hi Splunkers, in the end I worked with Support and it figured out the reason of not working regex: when applied to a monitor input on a UF, like that case, the blacklist parameter is applied to file name and/or path; our purpose is to filter out based on file payload and, to achieve this, we must work on HF, changing inputs.conf or both props and transforms.conf.
Hi Splunkers, in the end I worked with Support and it figured out the reason of not working regex: when applied to a monitor input on a UF, like that case, the blacklist parameter is applied to file name and/or path; our purpose is to filter out based on file payload and, to achieve this, we must work on HF, changing inputs.conf or both props and transforms.conf.
Wait a minute. Are you ingesting those logs from files??? Because then it would make sense indeed. I suppose everyone involved assumed that as we're talking about ingesting windows events we were talking about eventlog input, not monitor one.
Hi @SplunkExplorer ,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉
Do you ingest events as "old format" or XML?
With XML events you have to do it differently.
https://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf
* $XmlRegex: Use this key for filtering when you render Windows Event log events in XML by setting the 'renderXml' setting to "true". Search the online documentation for "Filter data in XML format with the XmlRegex key" for details.
Old format, no XML
Then you need to read this:
* key=regex format: * A whitespace-separated list of Event Log components to match, and regular expressions to match against against them. * There can be one match expression or multiple expressions per line. * The key must belong to the set of valid keys provided in the "Valid keys for the key=regex format" section. * The regex consists of a leading delimiter, the regex expression, and a trailing delimiter. Examples: %regex%, *regex*, "regex" * When multiple match expressions are present, they are treated as a logical AND. In other words, all expressions must match for the line to apply to the event. * If the value represented by the key does not exist, it is not considered a match, regardless of the regex. * Example: whitelist = EventCode=%^200$% User=%jrodman% Include events only if they have EventCode 200 and relate to User jrodman # Valid keys for the key=regex format: * The following keys are equivalent to the fields that appear in the text of the acquired events: * Category, CategoryString, ComputerName, EventCode, EventType, Keywords, LogName, Message, OpCode, RecordNumber, Sid, SidType, SourceName, TaskCategory, Type, User * There are three special keys that do not appear literally in the event. * $TimeGenerated: The time that the computer generated the event * $Timestamp: The time that the event was received and recorded by the Event Log service.
What's important is that you specify which field the regex is to be applied to and that it needs to be enclosed in delimiters.
Hi @SplunkExplorer,
could you share a sample of your logs (some to filter and some to not filter)?
Anyway, after the equal you don't need quotes or other.
Ciao.
Giuseppe
Hi Giuseppe,
below the link to regex101 with a used regex and a log that match it:
Matching regex
Here same things but with a little change to log that made it not matching the regex, like expected:
Not matching regex
Another idea is my use of capturing groups; should I use them in another way?
Hi @SplunkExplorer,
there's a difference in the two logs that you have to manage:
in the not matching log there's "123(3)" between microsoft and com.
Please try this regex:
\]\sA\s+(.*)(microsoft|office|azure|o365|onenote|outlook|windowsupdate)(\(\d+\))(\d+\(\d+\))*(com|net|us)(\(\d+\))\s
that you can test at https://regex101.com/r/9mZoCU/3
Ciao.
Giuseppe
I fear I explained myself in a bad way Giuseppe, sorry.
Our purpose is to filter out only the case when domain you can find in parenthesis have a "proper" form.
For example:
microsoft.com
azure.net
office.us
Those domain for us are admitted ones, so we don't need to see them on SIEM and we want avoid that UF send logs with them on SPlunk Cloud.
On the other side, if the domain is "strange", like:
microsoft.123.com
azure-pirate.com
office.tryhackme.us
we want to be alerted and so, in this scenario, logs must be sent to Splunk.
Now, based on this, if you see Regex I shared with you, the normal behavior may be:
Case 1: regex matched -> Logs NOT send to Splunk
Case 2: regex NOT matched -> Logs MUST be sent on Splunk.
So, what the problem?
The logs for case 1 has been sent to Splunk, even if it match the regex and so it should be discarded.
In other words: the log of Regex Matched contain "microsoft.com", match the regex, should be discarded but it has been sent anyway to Splunk.
Hi @SplunkExplorer ,
in this case for the intermediate part that I added, you should try "+" instead "*" that meanse that if this part isn't present the url must not be matched:
\]\sA\s+(.*)microsoft(\(\d+\))(\d+\(\d+\))+(com|net|us)(\(\d+\))\s
as you can test at https://regex101.com/r/9mZoCU/4
Ciao.
Giuseppe