Splunk Search

Regex AND Operator

Contributor

I thought ?= acts like an AND operator.

Condition would be to capture words with >5 Upper-Case AND 4 Lower-Case and any other non-whitespace in the word.

Here's what i came up with - doesn't seem to work:
"s/(\S*[A-Z]{5,})(?=[a-z]{4,})\S*//g"

Would want it to capture strings like these:

  1. adsdkdkDKDKDdkd:djkDKDK
  2. ASaFaAdfkK-asdfoiA
  3. asdfASDFF

I realized this actually works but doesn't work on words with non-consecutive A-Z. Any idea how to make it work with non-consecutive A-Z?

TEST1 doesn't work (non-consecutive):
| makeresults | eval TEST="AAAaAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

TEST1 Does work (consecutive AAAAAaaaa)
| makeresults | eval TEST="AAAAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

Tags (2)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Using the positive lookahead approach, ((?=(?:\S*[A-Z]){5,})(?:\S*[a-z]){4,}\S*) will match both cases. The match itself says "any non-space followed by a lower-case letter, at least four times - followed by any non-space" while the positive lookahead first asserts "any non-space followed by an upper-case letter, at least five times", no need to assert the followed-by part here.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Using the positive lookahead approach, ((?=(?:\S*[A-Z]){5,})(?:\S*[a-z]){4,}\S*) will match both cases. The match itself says "any non-space followed by a lower-case letter, at least four times - followed by any non-space" while the positive lookahead first asserts "any non-space followed by an upper-case letter, at least five times", no need to assert the followed-by part here.

View solution in original post

0 Karma

Contributor

Thanks Martin -

Came up with this solution as well but yours looks cleaner.

| makeresults | eval TEST="AAAaaAaaaa" | rex field=TEST max_match=0 "(?<TEST1>\S*(?=([a-z]*[A-Z]){4})(?=([A-Z]*[a-z]){6})[a-zA-Z]*\S*)"
0 Karma

Contributor

One question regarding your non-capture groups. Are those for efficiency? I realized i can take out the ?: and it still works.

0 Karma

SplunkTrust
SplunkTrust

Can you provide sample events and tell what do you want to extract it?
Also put the code in 10101 sample code format.

0 Karma

Contributor

"s/(\S*[A-Z]{5,})(?=[a-z]{4,})\S*//g"

I realized this actually works but doesn't work on words with non-consecutive A-Z. Any idea how to make it work with non-consecutive A-Z?

This would work AAAAAaaaassdjkd
This would not work AAaaAAaaaaasfd

0 Karma

Contributor

TEST1 doesn't work (non-consecutive):
| makeresults | eval TEST="AAAaAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

TEST1 Does work (consecutive)
| makeresults | eval TEST="AAAAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

0 Karma

Communicator

can you please provide the sample output too ? I couldn't get the >5 part

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!