I need to extract various fields if they exist. CN, C, S, O, OU, Here is a sample data of five different events. Please note that this is a snippet of each event and not the entire event. I left in the ssl_issuer in the first event but removed the string in the last four events. One challenge is there are duplicate field names in ssl_issurer and ssl_subject. I have tried various regex expressions but they either get too much or too little out of the events. I would like to have one regex for each field in the transforms.conf, that way I don't have the whole thing fail if there is a problem in the data.
This fairly close, but skips the second and fourth event.
ssl_subject\="CN=(.*)C=(.*)S=(.*)O=(.*)OU=(.*)ssl_start_time
ssl_issuer="CN=DigiCert SHA2 High Assurance Server CA C=US O=DigiCert Inc OU=www.digicert.com" ssl_hash="f41565b049f039e765a0f8be8271a4b4817b7378" ssl_subject="CN=syndication.twitter.com C=US S=California O=Twitter, Inc. OU=Twitter Security" ssl_start_time="Wed Jun 29 00:00:00 2016 UTC" ssl_end_time="Mon Sep 16 12:00:00 2019 UTC" ssl_type="X.509 Certificate" ssl_extended_key_usage="Server Authentication, Client Authentication" ssl_key_length="2048 bits" ssl_key_usage="Digital Signature, Key Encipherment"
ssl_subject="CN=*.eu-west-1.webrootcloudav.com" ssl_start_time="Tue Aug 22 00:00:00 2017 UTC" ssl_end_time="Sat Sep 22 12:00:00 2018 UTC" ssl_type="X.509 Certificate" ssl_extended_key_usage="" ssl_key_length="2048 bits" ssl_key_usage="Digital Signature, Non-Repudiation, Key Encipherment"
ssl_subject="CN=*.us.static.hrsmart.com C=US S=Virginia O=Deltek, Inc. OU=Security Services" ssl_start_time="Thu Jan 11 00:00:00 2018 UTC" ssl_end_time="Sun Mar 31 12:00:00 2019 UTC" ssl_type="X.509 Certificate" ssl_extended_key_usage="Server Authentication, Client Authentication" ssl_key_length="2048 bits" ssl_key_usage="Digital Signature, Key Encipherment"
ssl_subject="CN=*.googleapis.com C=US S=California O=Google Inc" ssl_start_time="Tue Mar 13 18:57:10 2018 UTC" ssl_end_time="Tue Jun 5 18:17:00 2018 UTC" ssl_type="X.509 Certificate" ssl_extended_key_usage="Server Authentication, Client Authentication" ssl_key_length="0 bits" ssl_key_usage="Digital Signature, Certificate Signing, CRL Signing"
ssl_subject="CN=subscription.rhsm.redhat.com C=US S=North Carolina O=Red Hat, Inc. OU=Red Hat Network" ssl_start_time="Thu May 18 16:30:24 2017 UTC" ssl_end_time="Sat May 18 16:30:24 2019 UTC" ssl_type="X.509 Certificate" ssl_extended_key_usage="Server Authentication" ssl_key_length="4096 bits" ssl_key_usage=""
I just found out regex101 lets you save tests!
partially working:
https://regex101.com/r/ZOyEIg/1
and @elliotproebstel working version
https://regex101.com/r/ZOyEIg/2
Ok, if you only want to match on values in events with ssl_subject=
, then this should do it:
ssl_subject\="(CN=(?<CN>[^=]*))?(C=(?<C>[^=]*))?(S=(?<S>[^=]*))?(O=(?<O>[^=]*))?(OU=(?<OU>[^=]*))?" ssl_start_time
Here's a link to test: https://regex101.com/r/ZOyEIg/3
I just found out regex101 lets you save tests!
partially working:
https://regex101.com/r/ZOyEIg/1
and @elliotproebstel working version
https://regex101.com/r/ZOyEIg/2
Much closer, but it is matching ssl_issuer= and ssl_subject=. It should only match values in ssl_subject=
Thanks for the information, it was a tremendous help.
This is what I used for subject:
"ssl_subject\="(CN=(?<ssl_subject_common_name>[^=]*))?(C=(?<C>[^=]*))?(S=(?<ssl_subject_state>[^=]*))?(O=(?<ssl_subject_organization>[^=]*))?(OU=(?<ssl_subject_unit>[^=]*))?" ssl_start_time"
This is what I used for issuer:
ssl_issuer\="(CN=(?<ssl_issuer_common_name>[^=]*))?(C=(?<C>[^=]*))?(S=(?<ssl_issuer_state>[^=]*))?(O=(?<ssl_issuer_organization>[^=]*))?(OU=(?<ssl_issuer_unit>[^=]*))?" ssl_hash
See if this gets you a bit closer:
("|\s)(?<key>(CN|C|O|OU))=(?<value>(\w|\s|\d|\.)+)(\s|")
It is different that what I have tried. This gets tripped up by ssl_issuer. In some cases the result doesn't include country code and other data.
You're totally right, I did't check my work closely enough - there's a few things on the capture group for the O,OU,S, etc that are easy to fix - let me see if I can work out something for the other items.
Here's a revision that I think should work:
("|\s)(?<key>(CN|C|O|OU|S))=(?<value>[^=\"]+)(?=(\s|"))