Splunk Search

How to extract 4 different strings with rex, count the number of different strings, and create a timechart of the weekly count?

skoelpin
SplunkTrust
SplunkTrust

I currently have a 4 different phrases which are between the fixed words "a:OrderMessage and a/:OrderMessage" . I have 1 phrase " Missed Delivery cut-off, Redated to 02/04/15 " but this has many different dates. I need a regex which counts the number of different phrases and creates a timechart showing a weekly count of them.

My current search.. This counts each phrase with a different date as an independent event rather than the same one

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | timechart span=1week count by Phrase

Example

" Missed Delivery cut-off, Redated to 01/18/15 "
" Missed Delivery cut-off, Redated to 02/04/15 "
" Existing account, Changed phone from 1111111111 to 2222222222 "

(2) Missed Delivery cut-off, Redated to
(1) Existing account, Changed phone from

0 Karma
1 Solution

aljohnson_splun
Splunk Employee
Splunk Employee

Regex: (?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))

Search:

 index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase

note: you need to do span=1w not span=1week for the span to work.

Regex explanation:

(?P<phrase>.+) # grab everything
(?:            # non capturing group
    (?:        # non capturing group
         \d{2}\/\d{2}\/\d{2})  # digits in date format
    |                          # OR
    (?:                        # non capturing group
         \s\d+\sto\s\d+)       # digits in second format
)                              # end

Try testing your regular expressions at regex101.com, here is an example of your question there.


Edit, to build off your second question:

What about this?

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" 
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You asked the same question in another thread to which I gave this answer:

Perhaps you can use sed to replace numbers with another character.

... | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | rex field=Phrase mode=sed "s/\d/x/g" | stats count by Phrase
---
If this reply helps you, Karma would be appreciated.
0 Karma

ramdaspr
Contributor
.. | rex "(?<text>.[a-z A-Z,-]+)" | timechart span=1w count by text

should work.

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Regex: (?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))

Search:

 index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase

note: you need to do span=1w not span=1week for the span to work.

Regex explanation:

(?P<phrase>.+) # grab everything
(?:            # non capturing group
    (?:        # non capturing group
         \d{2}\/\d{2}\/\d{2})  # digits in date format
    |                          # OR
    (?:                        # non capturing group
         \s\d+\sto\s\d+)       # digits in second format
)                              # end

Try testing your regular expressions at regex101.com, here is an example of your question there.


Edit, to build off your second question:

What about this?

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" 
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"
0 Karma

skoelpin
SplunkTrust
SplunkTrust

Thanks for this. Unfortunately it's still not working right as the dates are still showing up and it's not counting the strings 'Missed Delivery cut-off, Redated to' but rather its counting each string with a different date as an independent event.

I'll try out the regex101 site you gave me and see if I can get it working. Any other help is appreciated

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

The regex matches/correctly for the data you provided though - as you can see in the regex101 link - so then it sounds like an issue with your search not the extraction. When you look at the field phrase in the fields sidebar, do you see a count of each phrase?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

When looking at the 'phase' field on the left, I see 3 values.. 1 is 'Missed Delivery cut-off' and the other 2 are 'Existing account, changed phone from <>' So were getting dups for existing account and 2 of the other fields are not showing up

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Did you try the edit ? Can you update this question to include more sample data?

0 Karma

somesoni2
Revered Legend

You said there are 4 different type of phrases by you're listed just 2, so I will give solution to identify these 2. You can extend the search to include other two (add more condition in case statement).

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>")   | timechart span=1week count by Phrase
0 Karma

skoelpin
SplunkTrust
SplunkTrust

Thanks for the response. Some of the phrases/strings will be new so I'm unable to write them in the search. Is there anyway to modify the regex I posted so it doesn't include the date or numbers after the text?

I basically want to ignore all numbers and only include text

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I looked closer and saw that there has been only 4 cases in the past 2 years so I'm confident in your query. Can you help add in the other 2 cases?

Customer Master flagged as HLD

Customer Master flagged as FRD

Here's what I have, it's returning back only 3 cases and not returning the 4th

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>",match(Phrase, "Customer Master flagged as HLD."), "Flagged as HLD",match(Phrase, "Customer Master flagged as FRD."), "Flagged as FRD")  | timechart span=1week count by Phrase
0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...