Splunk Search

How to extract 4 different strings with rex, count the number of different strings, and create a timechart of the weekly count?

skoelpin
SplunkTrust
SplunkTrust

I currently have a 4 different phrases which are between the fixed words "a:OrderMessage and a/:OrderMessage" . I have 1 phrase " Missed Delivery cut-off, Redated to 02/04/15 " but this has many different dates. I need a regex which counts the number of different phrases and creates a timechart showing a weekly count of them.

My current search.. This counts each phrase with a different date as an independent event rather than the same one

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | timechart span=1week count by Phrase

Example

" Missed Delivery cut-off, Redated to 01/18/15 "
" Missed Delivery cut-off, Redated to 02/04/15 "
" Existing account, Changed phone from 1111111111 to 2222222222 "

(2) Missed Delivery cut-off, Redated to
(1) Existing account, Changed phone from

0 Karma
1 Solution

aljohnson_splun
Splunk Employee
Splunk Employee

Regex: (?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))

Search:

 index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase

note: you need to do span=1w not span=1week for the span to work.

Regex explanation:

(?P<phrase>.+) # grab everything
(?:            # non capturing group
    (?:        # non capturing group
         \d{2}\/\d{2}\/\d{2})  # digits in date format
    |                          # OR
    (?:                        # non capturing group
         \s\d+\sto\s\d+)       # digits in second format
)                              # end

Try testing your regular expressions at regex101.com, here is an example of your question there.


Edit, to build off your second question:

What about this?

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" 
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You asked the same question in another thread to which I gave this answer:

Perhaps you can use sed to replace numbers with another character.

... | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | rex field=Phrase mode=sed "s/\d/x/g" | stats count by Phrase
---
If this reply helps you, Karma would be appreciated.
0 Karma

ramdaspr
Contributor
.. | rex "(?<text>.[a-z A-Z,-]+)" | timechart span=1w count by text

should work.

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Regex: (?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))

Search:

 index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase

note: you need to do span=1w not span=1week for the span to work.

Regex explanation:

(?P<phrase>.+) # grab everything
(?:            # non capturing group
    (?:        # non capturing group
         \d{2}\/\d{2}\/\d{2})  # digits in date format
    |                          # OR
    (?:                        # non capturing group
         \s\d+\sto\s\d+)       # digits in second format
)                              # end

Try testing your regular expressions at regex101.com, here is an example of your question there.


Edit, to build off your second question:

What about this?

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" 
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"
0 Karma

skoelpin
SplunkTrust
SplunkTrust

Thanks for this. Unfortunately it's still not working right as the dates are still showing up and it's not counting the strings 'Missed Delivery cut-off, Redated to' but rather its counting each string with a different date as an independent event.

I'll try out the regex101 site you gave me and see if I can get it working. Any other help is appreciated

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

The regex matches/correctly for the data you provided though - as you can see in the regex101 link - so then it sounds like an issue with your search not the extraction. When you look at the field phrase in the fields sidebar, do you see a count of each phrase?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

When looking at the 'phase' field on the left, I see 3 values.. 1 is 'Missed Delivery cut-off' and the other 2 are 'Existing account, changed phone from <>' So were getting dups for existing account and 2 of the other fields are not showing up

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Did you try the edit ? Can you update this question to include more sample data?

0 Karma

somesoni2
Revered Legend

You said there are 4 different type of phrases by you're listed just 2, so I will give solution to identify these 2. You can extend the search to include other two (add more condition in case statement).

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>")   | timechart span=1week count by Phrase
0 Karma

skoelpin
SplunkTrust
SplunkTrust

Thanks for the response. Some of the phrases/strings will be new so I'm unable to write them in the search. Is there anyway to modify the regex I posted so it doesn't include the date or numbers after the text?

I basically want to ignore all numbers and only include text

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I looked closer and saw that there has been only 4 cases in the past 2 years so I'm confident in your query. Can you help add in the other 2 cases?

Customer Master flagged as HLD

Customer Master flagged as FRD

Here's what I have, it's returning back only 3 cases and not returning the 4th

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>",match(Phrase, "Customer Master flagged as HLD."), "Flagged as HLD",match(Phrase, "Customer Master flagged as FRD."), "Flagged as FRD")  | timechart span=1week count by Phrase
0 Karma
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...