Hello,
I have a field 'narrative' which contains long strings describing what happened to a piece of equipment. Within that string in various locations, there is a substring that identifies the piece of equipment (Yes, it would be much better to have this as a defined field on its own, no I don't know why the sysadmins set it up this way, I just inherited it). The equipment identifier is a 16 character string, and the 5th and 6th characters are always the state abbreviation (ex. NJ for New Jersey, TX for Texas, etc.). It's not always the first substring within the field, so I can't just count to the first 5:6 characters.
Example: [may or may not be data here] 1234NJ56ABCD1234 [maybe some more data here]
I want to extract that 16 char substring that has a valid state abbreviation into a new field called "equip_id". I've tried rex narrative= "(\d{5}|\w{5})?(?<equip_id>\w{1})" but it is so far failing, and plus I think this would only get the 5th char. Plus I can't figure out where to put in the list of acceptable things to match against.
Any help appreciated.
Here's what we ended up using:
rex field=narrative "(?<equip_id>[A-Z]{4}([NY|NJ|TX|OR]{2})[A-Za-z0-9]{10})"
You could list groups of 2 letter state abbreviations with | between (or operator)
Multiple rex commands are fine as it only sets equip_id if there is a match
| rex "\s?(?<equip_id>\w{4}(NJ|TX|CT)\w{10})\s"
| rex "\s?(?<equip_id>\w{4}(NY|MA|CA|WA)\w{10})\s"
Here's what we ended up using:
rex field=narrative "(?<equip_id>[A-Z]{4}([NY|NJ|TX|OR]{2})[A-Za-z0-9]{10})"