Re: regex help

a212830 · ‎08-25-2013

Hi,

I'm setting up some null parsing via transforms.conf, and I want to include only a certain set of devices. I have it working with a generic regex, but now I want to get more specific. My feed looks like this:

1377442800000|522334|NormalizedCPUInfo|Utilization|2|CPU|WCMK2DC01|CPU1

1377442800000|522334|NormalizedCPUInfo|cpuIdleUtilization|98|CPU|WCMK2DC01|CPU1

1377442800000|522700|NormalizedCPUInfo|Utilization|2|CPU|WCNGMMK01|CPU5

1377442800000|522700|NormalizedCPUInfo|cpuIdleUtilization|98|CPU|WCNGMMK01|CPU5

I want to include only data from field7 that has certain data. My regex is as follows:

REGEX = .\|.\|.\|.\|.\|.\|.\|[Ww][Cc].

This isn't working. Any suggestions? Is there a good tool/website where I can test this stuff out?

lcrielaa · ‎08-26-2013

http://gskinner.com/RegExr/ is an online regex tester where you can copy sample data and test your regex on them to see what they match.

Also, if you are absolutely sure that "WC" can only occur after a pipe, then you can use a lookbehind regex.

(?<=|)WC

kristian_kolb · ‎08-25-2013

Do you want to extract the 7th field? Or do you want to send some events to the nullQueue based on the value of field 7?

The basic flaw with your regex is that there are no quantifiers for you wildcards. A single dot will only match one character.

For a nullQueue setup I'd recommend the following transform stanza where field7 starts with 'WC' or 'Wc or 'wC' or 'wc'

[discard_wc]
REGEX = (?:[^\|]+\|){6}[Ww][Cc]
DEST_KEY = queue
FORMAT = nullQueue

UPDATE:

Further explanation of the process, following the multitude of comments below.

\d{4}\s(bob|alice)

will match four digits, followed by a space, followed by either bob or alice.

If that is what you mean. If you are talking about the TRANSFORM-stanzas, they are all applied to each event matching the props.conf stanza (e.g. [your_sourcetype])

Any event of that sourcetype will then pass through the four transforms, discard, keep3, keep4 and keep7, in that order (important), before the event is returned for further processing.

When it enters this part of the processing pipeline, the queue is set to indexQueue by default.

The first transform (discard) will set queue to nullQueue, because the regex (.) will match any event.
The second transform (keep3) will change the queue to indexQueue if the regex matches, otherwise leave the queue value unchanged.
The same goes for the third and the fourth.
When the event has passed through the last transform, it will be processed according to the value of queue (store it in the index, or throw it away)

Hope this makes it a bit clearer.

/K

kristian_kolb · ‎08-26-2013

see update in my original answer. /K

a212830 · ‎08-26-2013

Are these OR statements?

kristian_kolb · ‎08-26-2013

which of them? pattern matching or code highlighting 🙂

a212830 · ‎08-26-2013

Oh, cool. Didn't know that I could do something like that. Thanks!

kristian_kolb · ‎08-26-2013

As you can see from lukejadamec's post, the pattern (or template if you like) is based on the number of non-pipe-characters-followed-by-a-pipe sequences.

E.g. for [keepField3] there should be two {2} sequences of non-pipe-characters [^\|]+ before you match on either NormalizedCPUInfo or NormalizedMemoryInfo

/K

took the liberty to fix some parenthesis and set the markup to code, which shows special characters.

lukejadamec · ‎08-26-2013

Perhaps something like this then:

props.conf

[your source or sourcetype] TRANSFORMS-filter = discard, keepField3, keepField4, keepField7

transforms.conf

[discard]
REGEX = . DEST_KEY = queue FORMAT = nullQueue

[keepField3] REGEX = (?:[^\|]+\|){2}(NormalizedCPUInfo|NormalizedMemoryInfo) DEST_KEY = queue FORMAT = indexQueue

[keepField4] REGEX = (?:[^\|]+\|){3}(Utilization|BitsIn) DEST_KEY = queue FORMAT = indexQueue

[keepField7] REGEX = (?:[^\|]+\|){6}WC DEST_KEY = queue FORMAT = indexQueue

a212830 · ‎08-26-2013

Thanks. I was more looking for a template, rather than the values. For example, I want to filter on NormalizedCPUInfo and NormalizedMemoryInfo in field3, Utilization and BitsIn in field 4. I already have field 7 working.

lukejadamec · ‎08-26-2013

You need to be more specific.

What exactly are the values of field 3 that you want to keep?
What exactly are the values of field 4 that you want to keep?
What exactly are the values of field 7 that you want to keep?

a212830 · ‎08-26-2013

This was a big help. Is there a way to filter out on fields 3, 4 and 7? The existing one filters on 7. Sorry, I'm working on my regex skills, but this is beyond me.

lukejadamec · ‎08-25-2013

For example: If you tell Splunk 6 different specific ways you can drop an event, then Splunk will examine each of those 6 on each event; Whereas, if you tell Splunk 1 specific way to keep an event and drop all others, then Splunk searches for only that one specific thing for each event.

In your example, I would not include both upper and lowercase WwCc if you will only ever see uppercase.

lukejadamec · ‎08-25-2013

I’m not sure that filtering in or out at index time makes much of a difference from an in or out stand point alone. What you really want to do is be as precise as possible when implementing a configuration (at index or search time).

a212830 · ‎08-25-2013

Thanks. Given that this is going to be a relativly high-volume feed (60gb an hour), and I have about a dozen wildcards to filter on, which would be more efficient - filtering in, or out?

yannK · ‎08-25-2013

You can use splunk search bar to try on a sample indexed (see the regex command)
for regex ressources :http://www.regular-expressions.info/

regex help

App Platform's 2025 Year in Review: A Year of Innovation, Growth, and Community

Operationalizing Entity Risk Score with Enterprise Security 8.3+

Unlock Database Monitoring with Splunk Observability Cloud

Join the Conversation

regex help

App Platform's 2025 Year in Review: A Year of Innovation, Growth, and Community

Operationalizing Entity Risk Score with Enterprise Security 8.3+

Unlock Database Monitoring with Splunk Observability Cloud