Hi,
Can any one please help in creating regex to extract 12 words(Words with characters/letters only) from beginning of the field? Sharing few samples with required output:
1)00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
Required Output - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
2)001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
Required Output - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1;
3)00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
Required Output:Invalid requestTimestamp
4)01hg34hgh44hghg4 - Exception while calling System A - null
Required Output:Exception while calling System A - null
Again, your words don't quite match your expected output, however, does this work for you?
| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null
Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response."
| rex field=raw max_match=0 "(\b)(?<words>[A-Za-z'_]+)(\b|$)"
| eval words = mvjoin(words, " ")
One more question @ITWhisperer , how can we ignore the bunch of letters coming in alphanumeric words delimited by hyphen .
Example:
CSR-345sc453-a2da-4850-aacb-7f35d5127b21 - Sending error response back in 2136 msecs.
Expected output - CSR Sending error response back in msecs OR Sending error response back in msecs
Regex shared by you is including "aacb" also but we want to ignore it.
Requirement is to extract the statement without any correlation/context id so as to uniquely identify error statement.
| makeresults format=csv data="raw
CSR-345sc453-a2da-4850-aacb-7f35d5127b21 - Sending error response back in 2136 msecs.
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null
Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response."
| rex field=raw max_match=0 "(\b)(?<words>[A-Za-z'_]+)(\b|$)(?!\-)"
| eval words = mvjoin(words, " ")
Thanks @ITWhisperer & @bowesmana for all your help!
Your stated requirement does not match completely with your examples. For example, some expected outputs have fewer "words" than are available in the "field". Also, is there an unwritten requirement that your "words" begin with a letter but could contain numbers?
Making some assumptions derived from your written requirement and expected outputs, you could try something like this
| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null"
| rex field=raw " (?<dozenwords>([A-Za-z][A-Za-z0-9]*[^A-Za-z0-9]+){0,11}[A-Za-z][A-Za-z0-9]*)"
@ITWhisperer Thanks for sharing the regex. It is working for some of the examples but not for all. I think this is because I have not clearly explained the requirement. My requirement is to capture all the words that have letters only and completely ignore(reject) alphanumeric/numeric words & special characters. Also, I would like to extract full text , not limited to 12 words. Could you please share the regex and explanation also if possible?
Sharing couple of examples where regex is not working:
1)Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY"
Output with regex - "Exception message - CSR" and for some other records it is coming as "Exception message - CSR-a4cd725c"
Required Output - Exception Message CSR Generating exception because of multiple stage failure abc ELIGIBILITY
2)0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
Output - Exception message
Required Output - Exception message Generating exception because of multiple stage failure abc_ELIGIBILITY
3) b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response.
Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response
Required Output - Exception message Exception in abc module. Creating error response Response creation couldn't happen for all the placements. Creating error response.
Again, your words don't quite match your expected output, however, does this work for you?
| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null
Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response."
| rex field=raw max_match=0 "(\b)(?<words>[A-Za-z'_]+)(\b|$)"
| eval words = mvjoin(words, " ")
You can use rex, but your example is not entirely clear - you are expecting - and | and / characters in your output?
See the rex statement in this example with your data.
| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null"
| rex field=raw max_match=0 " (?<words>[A-Za-z]+)"
| eval words = mvjoin(words, " ")
Thanks for the reply @bowesmana
Yes, I would like to ignore special characters also if possible.
Your regex will work if the requirement is to ignore the numeric digits in alphanumaric words but my requirement is to completely ignore the words that have numeric digits.