I currently have a 4 different phrases which are between the fixed words "a:OrderMessage and a/:OrderMessage" . I have 1 phrase " Missed Delivery cut-off, Redated to 02/04/15 "
but this has many different dates. I need a regex which counts the number of different phrases and creates a timechart showing a weekly count of them.
My current search.. This counts each phrase with a different date as an independent event rather than the same one
index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | timechart span=1week count by Phrase
Example
" Missed Delivery cut-off, Redated to 01/18/15 "
" Missed Delivery cut-off, Redated to 02/04/15 "
" Existing account, Changed phone from 1111111111 to 2222222222 "
(2) Missed Delivery cut-off, Redated to
(1) Existing account, Changed phone from
(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))
index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>"
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase
note: you need to do span=1w not span=1week for the span to work.
(?P<phrase>.+) # grab everything
(?: # non capturing group
(?: # non capturing group
\d{2}\/\d{2}\/\d{2}) # digits in date format
| # OR
(?: # non capturing group
\s\d+\sto\s\d+) # digits in second format
) # end
Try testing your regular expressions at regex101.com, here is an example of your question there.
Edit, to build off your second question:
What about this?
index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>"
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>"
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"
You asked the same question in another thread to which I gave this answer:
Perhaps you can use sed to replace numbers with another character.
... | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | rex field=Phrase mode=sed "s/\d/x/g" | stats count by Phrase
.. | rex "(?<text>.[a-z A-Z,-]+)" | timechart span=1w count by text
should work.
(?P.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))
index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>"
| rex "(?P<phrase>.+)(?:(?:\d{2}\/\d{2}\/\d{2})|(?:\s\d+\sto\s\d+))"
| timechart span=1w count by phrase
note: you need to do span=1w not span=1week for the span to work.
(?P<phrase>.+) # grab everything
(?: # non capturing group
(?: # non capturing group
\d{2}\/\d{2}\/\d{2}) # digits in date format
| # OR
(?: # non capturing group
\s\d+\sto\s\d+) # digits in second format
) # end
Try testing your regular expressions at regex101.com, here is an example of your question there.
Edit, to build off your second question:
What about this?
index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>"
| rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>"
| rex field=Phrase "(?P<phrase>.+)(?:(?:\sto\s)|(?:\sfrom\s))"
Thanks for this. Unfortunately it's still not working right as the dates are still showing up and it's not counting the strings 'Missed Delivery cut-off, Redated to' but rather its counting each string with a different date as an independent event.
I'll try out the regex101 site you gave me and see if I can get it working. Any other help is appreciated
The regex matches/correctly for the data you provided though - as you can see in the regex101 link - so then it sounds like an issue with your search not the extraction. When you look at the field phrase
in the fields sidebar, do you see a count of each phrase?
When looking at the 'phase' field on the left, I see 3 values.. 1 is 'Missed Delivery cut-off' and the other 2 are 'Existing account, changed phone from <>' So were getting dups for existing account and 2 of the other fields are not showing up
Did you try the edit ? Can you update this question to include more sample data?
You said there are 4 different type of phrases by you're listed just 2, so I will give solution to identify these 2. You can extend the search to include other two (add more condition in case statement).
index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>") | timechart span=1week count by Phrase
Thanks for the response. Some of the phrases/strings will be new so I'm unable to write them in the search. Is there anyway to modify the regex I posted so it doesn't include the date or numbers after the text?
I basically want to ignore all numbers and only include text
I looked closer and saw that there has been only 4 cases in the past 2 years so I'm confident in your query. Can you help add in the other 2 cases?
Customer Master flagged as HLD
Customer Master flagged as FRD
Here's what I have, it's returning back only 3 cases and not returning the 4th
index="uvtrans" "<a:OrderMessage>*</a:OrderMessage>" NOT "<a:OrderMessage>OK</a:OrderMessage>" | rex "\<a:OrderMessage\>(?P<Phrase>.*?)\<\/a:OrderMessage\>" | eval Phrase=case(match(Phrase,"Missed Delivery cut-off, Redated to"),"Missed Delivery cut-off, Redated to <<Date>>",match(Phrase,"Existing account, Changed phone from "),"Existing account, Changed phone from <<PhoneNumber>> to <<PhoneNumber>>",match(Phrase, "Customer Master flagged as HLD."), "Flagged as HLD",match(Phrase, "Customer Master flagged as FRD."), "Flagged as FRD") | timechart span=1week count by Phrase