I am trying to extract text from a specific attribute that I am querying in LDAP. Our "altSecurityIdenities" attribute is not formatted the same on all users where the data in it either has an additional text string I want to capture or it doesn't. When I get it so it only selects the additional string, the other attributes that don't have that string are gathering too much, and if I fix it so the others capture less, so does my first example. Confused yet?
{"altSecurityIdentities":["X509:<I>C=US,O=Entrust,OU=Certification Authorities,OU=Entrust Managed Services SSP CA<S>OID.0.9.2342.19200300.100.1.1=16651003215794 CN=DOE JANE (Affiliate)"]}
{"altSecurityIdentities":["X509:<I>C=US,O=Entrust,OU=Certification Authorities,OU=Entrust Managed Services SSP CA<S>CN=DOE JOHN OID.0.9.2342.19200300.100.1.1=16651002070291"]}
Basically I'm trying to extract the name after CN=
, but since the lines aren't structured the same way (OID value comes before CN in one event but not the other), I'm having trouble finding the balance where I can capture the extra string in one, but not gather the OID value of the other.
I started with this simple regex:
CN=[^"]+
While that captures CN=DOE JANE (Affiliate)
correctly, it also captures CN=DOE JOHN OID.0.9.2342.19200300.100.1.1=16651002070291
since the quotation is at the end of the query after the OID string on the other object. I know I'm missing something fairly simple, but I just can't seem to get it.
This works for the scenarios you've given.
CN=(?P<CN>.*+)(?:OID|"]})(?:[^"]+)
The question is... are those the only scenarios? It would be a good idea to first... take a look at the patterns using the punct field with a simple stats count by punct
and then examine the distinction between the different patterns.
That way you can find other possible "runon sentence" holes. But the fact that you can anchor on the CN makes this pretty clean. The only consequence in this case is that if the CN is not followed by OID or "}} it's not going to pick it up. So perhaps a bit of tweaking or... what i'm sure are many additional suggestions happening while I'm typing this, will help! 🙂
Try this
rex "CN=(?<cn>.+?)(OID|\")" | table cn
That did the trick. I guess I stared at it too long. Much obliged.
Zactly. However it will make it more efficient if you mark the second capturing group as non capturing. Otherwise regex will pull it and not pull and discard it. Better to have it not use it. (in the scheme of things.) so (OID|\") becomes (?:OID|\")