Hi,
I'm setting up some null parsing via transforms.conf, and I want to include only a certain set of devices. I have it working with a generic regex, but now I want to get more specific. My feed looks like this:
1377442800000|522334|NormalizedCPUInfo|Utilization|2|CPU|WCMK2DC01|CPU1
1377442800000|522334|NormalizedCPUInfo|cpuIdleUtilization|98|CPU|WCMK2DC01|CPU1
1377442800000|522682|NormalizedMemoryInfo|Total|42948878336|Memory|WCNGDCC02|Bluecoat Memory
1377442800000|522682|NormalizedMemoryInfo|Utilization|12|Memory|WCNGDCC02|Bluecoat Memory
1377442800000|522700|NormalizedCPUInfo|Utilization|2|CPU|WCNGMMK01|CPU5
1377442800000|522700|NormalizedCPUInfo|cpuIdleUtilization|98|CPU|WCNGMMK01|CPU5
I want to include only data from field7 that has certain data. My regex is as follows:
REGEX = .\|.\|.\|.\|.\|.\|.\|[Ww][Cc].
This isn't working. Any suggestions? Is there a good tool/website where I can test this stuff out?
http://gskinner.com/RegExr/ is an online regex tester where you can copy sample data and test your regex on them to see what they match.
Also, if you are absolutely sure that "WC" can only occur after a pipe, then you can use a lookbehind regex.
(?<=|)WC
Do you want to extract the 7th field? Or do you want to send some events to the nullQueue based on the value of field 7?
The basic flaw with your regex is that there are no quantifiers for you wildcards. A single dot will only match one character.
For a nullQueue
setup I'd recommend the following transform stanza where field7 starts with 'WC' or 'Wc or 'wC' or 'wc'
[discard_wc]
REGEX = (?:[^\|]+\|){6}[Ww][Cc]
DEST_KEY = queue
FORMAT = nullQueue
UPDATE:
Further explanation of the process, following the multitude of comments below.
\d{4}\s(bob|alice)
will match four digits, followed by a space, followed by either bob
or alice
.
If that is what you mean. If you are talking about the TRANSFORM-stanzas, they are all applied to each event matching the props.conf stanza (e.g. [your_sourcetype]
)
Any event of that sourcetype will then pass through the four transforms, discard, keep3, keep4
and keep7
, in that order (important), before the event is returned for further processing.
When it enters this part of the processing pipeline, the queue
is set to indexQueue
by default.
The first transform (discard) will set queue
to nullQueue
, because the regex (.) will match any event.
The second transform (keep3) will change the queue
to indexQueue
if the regex matches, otherwise leave the queue
value unchanged.
The same goes for the third and the fourth.
When the event has passed through the last transform, it will be processed according to the value of queue
(store it in the index, or throw it away)
Hope this makes it a bit clearer.
/K
see update in my original answer. /K
Are these OR statements?
which of them? pattern matching or code highlighting 🙂
Oh, cool. Didn't know that I could do something like that. Thanks!
As you can see from lukejadamec's post, the pattern (or template if you like) is based on the number of non-pipe-characters-followed-by-a-pipe sequences.
E.g. for [keepField3]
there should be two {2}
sequences of non-pipe-characters [^\|]+
before you match on either NormalizedCPUInfo
or NormalizedMemoryInfo
/K
took the liberty to fix some parenthesis and set the markup to code
, which shows special characters.
Perhaps something like this then:
props.conf
[your source or sourcetype]
TRANSFORMS-filter = discard, keepField3, keepField4, keepField7
transforms.conf
[discard]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[keepField3]
REGEX = (?:[^\|]+\|){2}(NormalizedCPUInfo|NormalizedMemoryInfo)
DEST_KEY = queue
FORMAT = indexQueue
[keepField4]
REGEX = (?:[^\|]+\|){3}(Utilization|BitsIn)
DEST_KEY = queue
FORMAT = indexQueue
[keepField7]
REGEX = (?:[^\|]+\|){6}WC
DEST_KEY = queue
FORMAT = indexQueue
Thanks. I was more looking for a template, rather than the values. For example, I want to filter on NormalizedCPUInfo and NormalizedMemoryInfo in field3, Utilization and BitsIn in field 4. I already have field 7 working.
You need to be more specific.
What exactly are the values of field 3 that you want to keep?
What exactly are the values of field 4 that you want to keep?
What exactly are the values of field 7 that you want to keep?
This was a big help. Is there a way to filter out on fields 3, 4 and 7? The existing one filters on 7. Sorry, I'm working on my regex skills, but this is beyond me.
For example: If you tell Splunk 6 different specific ways you can drop an event, then Splunk will examine each of those 6 on each event; Whereas, if you tell Splunk 1 specific way to keep an event and drop all others, then Splunk searches for only that one specific thing for each event.
In your example, I would not include both upper and lowercase WwCc if you will only ever see uppercase.
I’m not sure that filtering in or out at index time makes much of a difference from an in or out stand point alone. What you really want to do is be as precise as possible when implementing a configuration (at index or search time).
Thanks. Given that this is going to be a relativly high-volume feed (60gb an hour), and I have about a dozen wildcards to filter on, which would be more efficient - filtering in, or out?
You can use splunk search bar to try on a sample indexed (see the regex command)
for regex ressources :http://www.regular-expressions.info/