I need to use regex to split a field into two parts, delimited by an underscore.
The vast majority of the time, my field (a date/time ID) looks like this, where AB or ABC is a 2 or 3 character identifier.
I use the following rex command to extract, and it works great.
| rex field=originalField "(?<subField1>.*)\_(?<subField2>.*)"
originalField = 11232016-0056_ABC subField1 = 11232016-0056 subField2 = ABC
However, I have a few special cases where
originalField = 11232016-0056_ABC_M, where M could be anything alphanumeric following an additional underscore.
When I use the above rex command, I get the following result:
originalField = 11232016-0056_ABC_M subField1 = 11232016-0056_ABC subField2 = M
I want to see the following:
originalField = 11232016-0056_ABC_M subField1 = 11232016-0056 subField2 = ABC_M
Basically, I need it to split at the first underscore and ignore all subsequent underscores.
.... | rex field=originalField "(?<subField1>[^_]+)_(?<subField2>.+)"
| rex field=specimenId "(?<subField1>[^_]+)_(?<subField2>.*)"
Changed + to * to account for cases where _ABC may not exist.
Hello Past mstark31. Current mstark31 thanks you for asking this question 3 years ago.
sorry -too fast on the draw. I didnt see the additional info around possible 2nd "_"'s occurring.
gdziuba's answer works perfectly (or so I think:))
This should get you going.
.... | rex field=originalField "(?<subField1>[^_]+)_(?<subField2>.*)"
Use this if you want to keep the underscore at the end of the line in the case that the character is other than an underscore.
.... | rex field=originalField "(?<subField1>.*?_)(?<subField2>.*)"