Splunk Search

Regex to extract letters only

poojak2579
Path Finder

Hi,
Can any one please help in creating regex to extract 12 words(Words with characters/letters only) from beginning of the field? Sharing few samples with required output:

 

1)00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations

Required Output - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations

2)001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;

Required Output - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1;


3)00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z

Required Output:Invalid requestTimestamp

4)01hg34hgh44hghg4 - Exception while calling System A - null

Required Output:Exception while calling System A - null

 

 

 

 

 

 

Labels (1)
Tags (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Again, your words don't quite match your expected output, however, does this work for you?

| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null
Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response."
| rex field=raw max_match=0 "(\b)(?<words>[A-Za-z'_]+)(\b|$)"
| eval words = mvjoin(words, " ")

View solution in original post

poojak2579
Path Finder

One more question @ITWhisperer , how can we ignore the  bunch of letters coming in alphanumeric words delimited by hyphen .
Example:
CSR-345sc453-a2da-4850-aacb-7f35d5127b21 - Sending error response back in 2136 msecs.

Expected output - CSR Sending error response back in msecs  OR  Sending error response back in msecs

Regex shared by you  is including "aacb" also  but we want to ignore it. 
Requirement is to extract the statement without any correlation/context id so as to uniquely identify error statement. 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults format=csv data="raw
CSR-345sc453-a2da-4850-aacb-7f35d5127b21 - Sending error response back in 2136 msecs.
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null
Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response."
| rex field=raw max_match=0 "(\b)(?<words>[A-Za-z'_]+)(\b|$)(?!\-)"
| eval words = mvjoin(words, " ")

poojak2579
Path Finder

Thanks @ITWhisperer  & @bowesmana  for all your help!

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Your stated requirement does not match completely with your examples. For example, some expected outputs have fewer "words" than are available in the "field". Also, is there an unwritten requirement that your "words" begin with a letter but could contain numbers?

Making some assumptions derived from your written requirement and expected outputs, you could try something like this

| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null"
| rex field=raw " (?<dozenwords>([A-Za-z][A-Za-z0-9]*[^A-Za-z0-9]+){0,11}[A-Za-z][A-Za-z0-9]*)"
0 Karma

poojak2579
Path Finder

@ITWhisperer Thanks for sharing the regex. It is working for some of the examples but not for all. I think this is because I have not clearly explained the requirement. My requirement is to capture all the words that have letters only and completely ignore(reject) alphanumeric/numeric words & special characters. Also, I would like to extract full text , not limited to 12 words. Could you please share the regex and explanation also if possible?

Sharing couple of examples where  regex is not working:

1)Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY"

Output with regex - "Exception message - CSR" and for some other records it is coming as "Exception message - CSR-a4cd725c"
Required Output - Exception Message CSR Generating exception because of multiple stage failure abc ELIGIBILITY


2)0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY

Output - Exception message
Required Output - Exception message Generating exception because of multiple stage failure abc_ELIGIBILITY

3) b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response.

Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response
Required Output - Exception message Exception in abc module. Creating error response Response creation couldn't happen for all the placements. Creating error response.

  

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Again, your words don't quite match your expected output, however, does this work for you?

| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null
Exception message - CSR-a4cd725c-3d73-426c-b254-5e4f4adc4b26 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
0013c5fb1737577541466 - Exception message - 0013c5fb1737577541466 - Generating exception because of multiple stage failure - abc_ELIGIBILITY
b187c4411737535464656 - Exception message - b187c4411737535464656 - Exception in abc module. Creating error response - b187c4411737535464656 - Response creation couldn't happen for all the placements. Creating error response."
| rex field=raw max_match=0 "(\b)(?<words>[A-Za-z'_]+)(\b|$)"
| eval words = mvjoin(words, " ")

bowesmana
SplunkTrust
SplunkTrust

You can use rex, but your example is not entirely clear - you are expecting - and | and / characters in your output?

See the rex statement in this example with your data.

| makeresults format=csv data="raw
00012243asdsfgh - No recommendations from System A. Message - ERROR: System A | No Matching Recommendations
001b135c-5348-4arf-b3vbv344v - Validation Exception reason - Empty/Invalid Page_Placement Value ::: Input received - Channel1; ::: Other details - 001sss-445-4f45-b3ad-gsdfg34 - Incorrect page and placement found: Channel1;
00assew-34df-34de-d34k-sf34546d :: Invalid requestTimestamp : 2025-01-21T21:36:21.224Z
01hg34hgh44hghg4 - Exception while calling System A - null"
| rex field=raw max_match=0 " (?<words>[A-Za-z]+)"
| eval words = mvjoin(words, " ")

 

poojak2579
Path Finder

Thanks for the reply @bowesmana 
Yes, I would like to ignore special characters also if possible.
Your regex will work if the requirement is to ignore the numeric digits in alphanumaric words but my requirement is to completely ignore the words  that have numeric digits.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In January, the Splunk Threat Research Team had one release of new security content via the Splunk ES Content ...

Expert Tips from Splunk Professional Services, Ensuring Compliance, and More New ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Observability Release Update: AI Assistant, AppD + Observability Cloud Integrations & ...

This month’s releases across the Splunk Observability portfolio deliver earlier detection and faster ...