Splunk Search

Regex AND Operator

Contributor

I thought ?= acts like an AND operator.

Condition would be to capture words with >5 Upper-Case AND 4 Lower-Case and any other non-whitespace in the word.

Here's what i came up with - doesn't seem to work:
"s/(\S*[A-Z]{5,})(?=[a-z]{4,})\S*//g"

Would want it to capture strings like these:

  1. adsdkdkDKDKDdkd:djkDKDK
  2. ASaFaAdfkK-asdfoiA
  3. asdfASDFF

I realized this actually works but doesn't work on words with non-consecutive A-Z. Any idea how to make it work with non-consecutive A-Z?

TEST1 doesn't work (non-consecutive):
| makeresults | eval TEST="AAAaAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

TEST1 Does work (consecutive AAAAAaaaa)
| makeresults | eval TEST="AAAAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

Tags (2)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Using the positive lookahead approach, ((?=(?:\S*[A-Z]){5,})(?:\S*[a-z]){4,}\S*) will match both cases. The match itself says "any non-space followed by a lower-case letter, at least four times - followed by any non-space" while the positive lookahead first asserts "any non-space followed by an upper-case letter, at least five times", no need to assert the followed-by part here.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Using the positive lookahead approach, ((?=(?:\S*[A-Z]){5,})(?:\S*[a-z]){4,}\S*) will match both cases. The match itself says "any non-space followed by a lower-case letter, at least four times - followed by any non-space" while the positive lookahead first asserts "any non-space followed by an upper-case letter, at least five times", no need to assert the followed-by part here.

View solution in original post

0 Karma

Contributor

Thanks Martin -

Came up with this solution as well but yours looks cleaner.

| makeresults | eval TEST="AAAaaAaaaa" | rex field=TEST max_match=0 "(?<TEST1>\S*(?=([a-z]*[A-Z]){4})(?=([A-Z]*[a-z]){6})[a-zA-Z]*\S*)"
0 Karma

Contributor

One question regarding your non-capture groups. Are those for efficiency? I realized i can take out the ?: and it still works.

0 Karma

SplunkTrust
SplunkTrust

Can you provide sample events and tell what do you want to extract it?
Also put the code in 10101 sample code format.

0 Karma

Contributor

"s/(\S*[A-Z]{5,})(?=[a-z]{4,})\S*//g"

I realized this actually works but doesn't work on words with non-consecutive A-Z. Any idea how to make it work with non-consecutive A-Z?

This would work AAAAAaaaassdjkd
This would not work AAaaAAaaaaasfd

0 Karma

Contributor

TEST1 doesn't work (non-consecutive):
| makeresults | eval TEST="AAAaAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

TEST1 Does work (consecutive)
| makeresults | eval TEST="AAAAAaaaassdjkd" | rex field=TEST max_match=0 "(?\S*([A-Z]{5,})(?=[a-z]{4,})\S*)"

0 Karma

Communicator

can you please provide the sample output too ? I couldn't get the >5 part

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!