Splunk Search

Regex Conditional Group Capture with Name

healthtrans
Explorer

I'm trying to build 1 regex to capture multiple sets of data. Below is a sample:

1. 20110221124637|21410|SENT:0.646861|51B11A011801830658
2. 20110221124854|21411|RECVD|00345251B1
3. 20110221124854|362|003452|SENT: 3.198847|51B11A011801830658
4. 20110221124854|362|003452|RECVD|00345251B1 
5. 20110221160534|431|011867|RECVD|01186751B1 
6. 20110221160534|431|011867|SENT: 0.278782|51B11A011801830658

Basically these lines are from different versions of a piece of software, but all contain the same information in different places. These are the fields I'd like to extract: Date, BinNumber, ClaimTrackNum, Direction, RespTime and ClaimType.

Here is the date:

1. 20110221124637         |21410|SENT:0.646861|51B11A011801830658
2. 20110221124854         |21411|RECVD|00345251B1
3. 20110221124854         |362|003452|SENT: 3.198847|51B11A011801830658
4. 20110221124854         |362|003452|RECVD|00345251B1 
5. 20110221160534         |431|011867|RECVD|01186751B1 
6. 20110221160534         |431|011867|SENT: 0.278782|51B11A011801830658    

ClaimTrackNum:

1. 20110221124637        |21410|        SENT:0.646861|51B11A011801830658
2. 20110221124854        |21411|        RECVD|00345251B1
3. 20110221124854        |362|          003452|SENT: 3.198847|51B11A011801830658
4. 20110221124854        |362|          003452|RECVD|00345251B1 
5. 20110221160534        |431|          011867|RECVD|01186751B1 
6. 20110221160534        |431|          011867|SENT: 0.278782|51B11A011801830658

BinNumber (May exist or not, or be in 2 spots):

1. 20110221124637|21410|SENT:0.646861|51B11A011801830658
2. 20110221124854|21411|RECVD|                            003452         51B1
3. 20110221124854|362|       003452        |SENT: 3.198847|51B11A011801830658
4. 20110221124854|362|       003452        |RECVD|        003452         51B1 
5. 20110221160534|431|       011867        |RECVD|        011867         51B1 
6. 20110221160534|431|       011867        |SENT: 0.278782|51B11A011801830658

RespTime (Only exists after a "SENT:" (old style) or "SENT: " (new)):

1. 20110221124637|21410|SENT:        0.646861         |51B11A011801830658
2. 20110221124854|21411|RECVD|00345251B1
3. 20110221124854|362|003452|SENT:   3.198847         |51B11A011801830658
4. 20110221124854|362|003452|RECVD|00345251B1 
5. 20110221160534|431|011867|RECVD|01186751B1 
6. 20110221160534|431|011867|SENT:   0.278782         |51B11A011801830658

ClaimType:

1. 20110221124637|21410|SENT:0.646861|          51B1       1A011801830658
2. 20110221124854|21411|RECVD|003452            51B1
3. 20110221124854|362|003452|SENT: 3.198847|    51B1       1A011801830658
4. 20110221124854|362|003452|RECVD|003452       51B1 
5. 20110221160534|431|011867|RECVD|011867       51B1 
6. 20110221160534|431|011867|SENT: 0.278782|    51B1       1A011801830658

And lastly the direction is either the word "SENT" or "RECVD"

I've come up with this regex, which fully matches everything:

(?'SW_Date'\d{14})\|(?'SW_ClaimTrackNum'\d+)\|(?>(?'SW_BinNumber'[0-9]{6})\||)(?>([A-z: ]+(?'SW_RespTime'[0-9.]+))|([A-z: ]+))\|(?>(?'SW_BinNumber'[0-9]{6})(?'SW_ClaimType'[0-9A-z]{4})|(?'SW_ClaimType'[0-9A-z]{4}))

Which splunk doesn't like... If I simply rename any duplicate field to different names, splunk doesn't have an issue.

(?'SW_Date'\d{14})\|(?'SW_ClaimTrackNum'\d+)\|(?>(?'SW_NewBinNumber'[0-9]{6})\||)(?>([A-z: ]+(?'SW_RespTime'[0-9.]+))|([A-z: ]+))\|(?>(?'SW_OldBinNumber'[0-9]{6})(?'SW_RecvdClaimType'[0-9A-z]{4})|(?'SW_SentClaimType'[0-9A-z]{4}))

but naturally splunk now thinks that "newbinnumber" and "oldbinnumber" are different fields when they are the exact same.

How can I write this regex to always extract those fields as "BinNumber" and "ClaimType"?

Tags (1)

woodcock
Esteemed Legend

Use coalesce like this:

... (?'SW_Date'd{14})|(?'SW_ClaimTrackNum'd+)|(?>(?'temp_NewBinNumber'[0-9]{6})||)(?>([A-z: ]+(?'SW_RespTime'[0-9.]+))|([A-z: ]+))|(?>(?'temp_OldBinNumber'[0-9]{6})(?'temp_RecvdClaimType'[0-9A-z]{4})|(?'temp_SentClaimType'[0-9A-z]{4})) | eval SW_BinNumber=coalesce(temp_NewBinNumber, temp_OldBinNumber) |eval SW_ClaimType=coalesce(temp_RecvdClaimType, temp_SentClaimType) | fields - temp*
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...

SplunkTrust Application Period is Officially OPEN!

It's that time, folks! The application/nomination period for the 2026-2027 SplunkTrust is officially open. If ...