Solved: How to extract 4 different strings with rex, count...

skoelpin · ‎04-08-2015

I currently have a 4 different phrases which are between the fixed words "a:OrderMessage and a/:OrderMessage" . I have 1 phrase " Missed Delivery cut-off, Redated to 02/04/15 " but this has many different dates. I need a regex which counts the number of different phrases and creates a timechart showing a weekly count of them.

My current search.. This counts each phrase with a different date as an independent event rather than the same one

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | timechart span=1week count by Phrase

Example

" Missed Delivery cut-off, Redated to 01/18/15 "
" Missed Delivery cut-off, Redated to 02/04/15 "
" Existing account, Changed phone from 1111111111 to 2222222222 "

(2) Missed Delivery cut-off, Redated to
(1) Existing account, Changed phone from

aljohnson_splun · ‎04-08-2015

Regex: `(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))`

Search:

 index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase

note: you need to do span=1w not span=1week for the span to work.

Regex explanation:

(?P<phrase>.+) # grab everything
(?:            # non capturing group
    (?:        # non capturing group
         \d{2}\/\d{2}\/\d{2})  # digits in date format
    |                          # OR
    (?:                        # non capturing group
         \s\d+\sto\s\d+)       # digits in second format
)                              # end

Try testing your regular expressions at regex101.com, here is an example of your question there.

Edit, to build off your second question:

What about this?

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" 
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"

View solution in original post

richgalloway · ‎04-09-2015

You asked the same question in another thread to which I gave this answer:

Perhaps you can use sed to replace numbers with another character.

... | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | rex field=Phrase mode=sed "s/\d/x/g" | stats count by Phrase

---
If this reply helps you, Karma would be appreciated.

ramdaspr · ‎04-08-2015

.. | rex "(?<text>.[a-z A-Z,-]+)" | timechart span=1w count by text

should work.

aljohnson_splun · ‎04-08-2015

Regex: `(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))`

Search:

 index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase

note: you need to do span=1w not span=1week for the span to work.

Regex explanation:

(?P<phrase>.+) # grab everything
(?:            # non capturing group
    (?:        # non capturing group
         \d{2}\/\d{2}\/\d{2})  # digits in date format
    |                          # OR
    (?:                        # non capturing group
         \s\d+\sto\s\d+)       # digits in second format
)                              # end

Try testing your regular expressions at regex101.com, here is an example of your question there.

Edit, to build off your second question:

What about this?

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" 
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" 
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"

skoelpin · ‎04-08-2015

Thanks for this. Unfortunately it's still not working right as the dates are still showing up and it's not counting the strings 'Missed Delivery cut-off, Redated to' but rather its counting each string with a different date as an independent event.

I'll try out the regex101 site you gave me and see if I can get it working. Any other help is appreciated

aljohnson_splun · ‎04-09-2015

The regex matches/correctly for the data you provided though - as you can see in the regex101 link - so then it sounds like an issue with your search not the extraction. When you look at the field phrase in the fields sidebar, do you see a count of each phrase?

skoelpin · ‎04-09-2015

When looking at the 'phase' field on the left, I see 3 values.. 1 is 'Missed Delivery cut-off' and the other 2 are 'Existing account, changed phone from <>' So were getting dups for existing account and 2 of the other fields are not showing up

aljohnson_splun · ‎04-09-2015

Did you try the edit ? Can you update this question to include more sample data?

somesoni2 · ‎04-08-2015

You said there are 4 different type of phrases by you're listed just 2, so I will give solution to identify these 2. You can extend the search to include other two (add more condition in case statement).

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>")   | timechart span=1week count by Phrase

skoelpin · ‎04-08-2015

Thanks for the response. Some of the phrases/strings will be new so I'm unable to write them in the search. Is there anyway to modify the regex I posted so it doesn't include the date or numbers after the text?

I basically want to ignore all numbers and only include text

skoelpin · ‎04-08-2015

I looked closer and saw that there has been only 4 cases in the past 2 years so I'm confident in your query. Can you help add in the other 2 cases?

Customer Master flagged as HLD

Customer Master flagged as FRD

Here's what I have, it's returning back only 3 cases and not returning the 4th

index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>",match(Phrase, "Customer Master flagged as HLD."), "Flagged as HLD",match(Phrase, "Customer Master flagged as FRD."), "Flagged as FRD")  | timechart span=1week count by Phrase

How to extract 4 different strings with rex, count the number of different strings, and create a timechart of the weekly count?

Regex: `(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))`

Search:

Regex explanation:

Regex: `(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))`

Search:

Regex explanation:

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

How to extract 4 different strings with rex, count the number of different strings, and create a timechart of the weekly count?

Regex: (?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))

Search:

Regex explanation:

Regex: (?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))

Search:

Regex explanation:

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Regex: `(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))`

Regex: `(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))`