Splunk Search

Regex Conditional Group Capture with Name

healthtrans
Explorer

I'm trying to build 1 regex to capture multiple sets of data. Below is a sample:

1. 20110221124637|21410|SENT:0.646861|51B11A011801830658
2. 20110221124854|21411|RECVD|00345251B1
3. 20110221124854|362|003452|SENT: 3.198847|51B11A011801830658
4. 20110221124854|362|003452|RECVD|00345251B1 
5. 20110221160534|431|011867|RECVD|01186751B1 
6. 20110221160534|431|011867|SENT: 0.278782|51B11A011801830658

Basically these lines are from different versions of a piece of software, but all contain the same information in different places. These are the fields I'd like to extract: Date, BinNumber, ClaimTrackNum, Direction, RespTime and ClaimType.

Here is the date:

1. 20110221124637         |21410|SENT:0.646861|51B11A011801830658
2. 20110221124854         |21411|RECVD|00345251B1
3. 20110221124854         |362|003452|SENT: 3.198847|51B11A011801830658
4. 20110221124854         |362|003452|RECVD|00345251B1 
5. 20110221160534         |431|011867|RECVD|01186751B1 
6. 20110221160534         |431|011867|SENT: 0.278782|51B11A011801830658    

ClaimTrackNum:

1. 20110221124637        |21410|        SENT:0.646861|51B11A011801830658
2. 20110221124854        |21411|        RECVD|00345251B1
3. 20110221124854        |362|          003452|SENT: 3.198847|51B11A011801830658
4. 20110221124854        |362|          003452|RECVD|00345251B1 
5. 20110221160534        |431|          011867|RECVD|01186751B1 
6. 20110221160534        |431|          011867|SENT: 0.278782|51B11A011801830658

BinNumber (May exist or not, or be in 2 spots):

1. 20110221124637|21410|SENT:0.646861|51B11A011801830658
2. 20110221124854|21411|RECVD|                            003452         51B1
3. 20110221124854|362|       003452        |SENT: 3.198847|51B11A011801830658
4. 20110221124854|362|       003452        |RECVD|        003452         51B1 
5. 20110221160534|431|       011867        |RECVD|        011867         51B1 
6. 20110221160534|431|       011867        |SENT: 0.278782|51B11A011801830658

RespTime (Only exists after a "SENT:" (old style) or "SENT: " (new)):

1. 20110221124637|21410|SENT:        0.646861         |51B11A011801830658
2. 20110221124854|21411|RECVD|00345251B1
3. 20110221124854|362|003452|SENT:   3.198847         |51B11A011801830658
4. 20110221124854|362|003452|RECVD|00345251B1 
5. 20110221160534|431|011867|RECVD|01186751B1 
6. 20110221160534|431|011867|SENT:   0.278782         |51B11A011801830658

ClaimType:

1. 20110221124637|21410|SENT:0.646861|          51B1       1A011801830658
2. 20110221124854|21411|RECVD|003452            51B1
3. 20110221124854|362|003452|SENT: 3.198847|    51B1       1A011801830658
4. 20110221124854|362|003452|RECVD|003452       51B1 
5. 20110221160534|431|011867|RECVD|011867       51B1 
6. 20110221160534|431|011867|SENT: 0.278782|    51B1       1A011801830658

And lastly the direction is either the word "SENT" or "RECVD"

I've come up with this regex, which fully matches everything:

(?'SW_Date'\d{14})\|(?'SW_ClaimTrackNum'\d+)\|(?>(?'SW_BinNumber'[0-9]{6})\||)(?>([A-z: ]+(?'SW_RespTime'[0-9.]+))|([A-z: ]+))\|(?>(?'SW_BinNumber'[0-9]{6})(?'SW_ClaimType'[0-9A-z]{4})|(?'SW_ClaimType'[0-9A-z]{4}))

Which splunk doesn't like... If I simply rename any duplicate field to different names, splunk doesn't have an issue.

(?'SW_Date'\d{14})\|(?'SW_ClaimTrackNum'\d+)\|(?>(?'SW_NewBinNumber'[0-9]{6})\||)(?>([A-z: ]+(?'SW_RespTime'[0-9.]+))|([A-z: ]+))\|(?>(?'SW_OldBinNumber'[0-9]{6})(?'SW_RecvdClaimType'[0-9A-z]{4})|(?'SW_SentClaimType'[0-9A-z]{4}))

but naturally splunk now thinks that "newbinnumber" and "oldbinnumber" are different fields when they are the exact same.

How can I write this regex to always extract those fields as "BinNumber" and "ClaimType"?

Tags (1)

woodcock
Esteemed Legend

Use coalesce like this:

... (?'SW_Date'd{14})|(?'SW_ClaimTrackNum'd+)|(?>(?'temp_NewBinNumber'[0-9]{6})||)(?>([A-z: ]+(?'SW_RespTime'[0-9.]+))|([A-z: ]+))|(?>(?'temp_OldBinNumber'[0-9]{6})(?'temp_RecvdClaimType'[0-9A-z]{4})|(?'temp_SentClaimType'[0-9A-z]{4})) | eval SW_BinNumber=coalesce(temp_NewBinNumber, temp_OldBinNumber) |eval SW_ClaimType=coalesce(temp_RecvdClaimType, temp_SentClaimType) | fields - temp*
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...