Splunk Search

regex help

Champion

Hi,

I'm setting up some null parsing via transforms.conf, and I want to include only a certain set of devices. I have it working with a generic regex, but now I want to get more specific. My feed looks like this:

1377442800000|522334|NormalizedCPUInfo|Utilization|2|CPU|WCMK2DC01|CPU1

1377442800000|522334|NormalizedCPUInfo|cpuIdleUtilization|98|CPU|WCMK2DC01|CPU1

1377442800000|522682|NormalizedMemoryInfo|Total|42948878336|Memory|WCNGDCC02|Bluecoat Memory

1377442800000|522682|NormalizedMemoryInfo|Utilization|12|Memory|WCNGDCC02|Bluecoat Memory

1377442800000|522700|NormalizedCPUInfo|Utilization|2|CPU|WCNGMMK01|CPU5

1377442800000|522700|NormalizedCPUInfo|cpuIdleUtilization|98|CPU|WCNGMMK01|CPU5

I want to include only data from field7 that has certain data. My regex is as follows:

REGEX = .\|.\|.\|.\|.\|.\|.\|[Ww][Cc].

This isn't working. Any suggestions? Is there a good tool/website where I can test this stuff out?

Tags (1)
0 Karma

Communicator

http://gskinner.com/RegExr/ is an online regex tester where you can copy sample data and test your regex on them to see what they match.

Also, if you are absolutely sure that "WC" can only occur after a pipe, then you can use a lookbehind regex.

(?<=|)WC

0 Karma

Ultra Champion

Do you want to extract the 7th field? Or do you want to send some events to the nullQueue based on the value of field 7?

The basic flaw with your regex is that there are no quantifiers for you wildcards. A single dot will only match one character.

For a nullQueue setup I'd recommend the following transform stanza where field7 starts with 'WC' or 'Wc or 'wC' or 'wc'

[discard_wc]
REGEX = (?:[^\|]+\|){6}[Ww][Cc]
DEST_KEY = queue
FORMAT = nullQueue

UPDATE:

Further explanation of the process, following the multitude of comments below.

\d{4}\s(bob|alice)

will match four digits, followed by a space, followed by either bob or alice.

If that is what you mean. If you are talking about the TRANSFORM-stanzas, they are all applied to each event matching the props.conf stanza (e.g. [your_sourcetype])

Any event of that sourcetype will then pass through the four transforms, discard, keep3, keep4 and keep7, in that order (important), before the event is returned for further processing.

When it enters this part of the processing pipeline, the queue is set to indexQueue by default.

  • The first transform (discard) will set queue to nullQueue, because the regex (.) will match any event.

  • The second transform (keep3) will change the queue to indexQueue if the regex matches, otherwise leave the queue value unchanged.

  • The same goes for the third and the fourth.

  • When the event has passed through the last transform, it will be processed according to the value of queue (store it in the index, or throw it away)

Hope this makes it a bit clearer.

/K

Ultra Champion

see update in my original answer. /K

0 Karma

Champion

Are these OR statements?

0 Karma

Ultra Champion

which of them? pattern matching or code highlighting 🙂

0 Karma

Champion

Oh, cool. Didn't know that I could do something like that. Thanks!

0 Karma

Ultra Champion

As you can see from lukejadamec's post, the pattern (or template if you like) is based on the number of non-pipe-characters-followed-by-a-pipe sequences.

E.g. for [keepField3] there should be two {2} sequences of non-pipe-characters [^\|]+ before you match on either NormalizedCPUInfo or NormalizedMemoryInfo

/K

took the liberty to fix some parenthesis and set the markup to code, which shows special characters.

Super Champion

Perhaps something like this then:

props.conf

[your source or sourcetype]
TRANSFORMS-filter = discard, keepField3, keepField4, keepField7

transforms.conf

[discard]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepField3]
REGEX = (?:[^\|]+\|){2}(NormalizedCPUInfo|NormalizedMemoryInfo)
DEST_KEY = queue
FORMAT = indexQueue

[keepField4]
REGEX = (?:[^\|]+\|){3}(Utilization|BitsIn)
DEST_KEY = queue
FORMAT = indexQueue

[keepField7]
REGEX = (?:[^\|]+\|){6}WC
DEST_KEY = queue
FORMAT = indexQueue

Champion

Thanks. I was more looking for a template, rather than the values. For example, I want to filter on NormalizedCPUInfo and NormalizedMemoryInfo in field3, Utilization and BitsIn in field 4. I already have field 7 working.

0 Karma

Super Champion

You need to be more specific.

What exactly are the values of field 3 that you want to keep?
What exactly are the values of field 4 that you want to keep?
What exactly are the values of field 7 that you want to keep?

0 Karma

Champion

This was a big help. Is there a way to filter out on fields 3, 4 and 7? The existing one filters on 7. Sorry, I'm working on my regex skills, but this is beyond me.

0 Karma

Super Champion

For example: If you tell Splunk 6 different specific ways you can drop an event, then Splunk will examine each of those 6 on each event; Whereas, if you tell Splunk 1 specific way to keep an event and drop all others, then Splunk searches for only that one specific thing for each event.

In your example, I would not include both upper and lowercase WwCc if you will only ever see uppercase.

Super Champion

I’m not sure that filtering in or out at index time makes much of a difference from an in or out stand point alone. What you really want to do is be as precise as possible when implementing a configuration (at index or search time).

0 Karma

Champion

Thanks. Given that this is going to be a relativly high-volume feed (60gb an hour), and I have about a dozen wildcards to filter on, which would be more efficient - filtering in, or out?

0 Karma

Splunk Employee
Splunk Employee

You can use splunk search bar to try on a sample indexed (see the regex command)
for regex ressources :http://www.regular-expressions.info/