I could use some expert assistance with a regex for breaking down a custom user-agent field in an IIS log into component fields while avoiding a conflict with other fields. We run software that uses IIS as a file server, and the software injects a custom user-agent value into the IIS log with every request. Here is a sample of the user agent: JTDI+(JDMS+1.0.11.2.20200807;+Win10+10.0;+229.0/62/-1;Branch|UnitType|System|City|ST|SiteIDOverride|SvrType|2.5;C8F7504F064E;UTA-AVD) The IIS log is space delimited, so all of that lands in the cs_user_agent field just fine. I made a sort of running mess of extracting the subfields. Within the string are subfields delimited by semicolon, and sub-subfields delimited by / and |. Here are my separate extractions, in order as the fields appear in the string: ^[^\(\n]*JTDI\+\((?P<jkversion>[^;]+)
^[^;\n]*;(?P<os>[^;]+)
^(?:[^;\n]*;){2}(?P<freespace>[^/]+)
^(?:[^;\n]*;){2}\+\d+\.\d+/(?P<pending>\d+)
^(?:[^;\n]*;){3}(?P<SiteDescription>[^;]+)
^(?:[^;\n]*;){4}(?P<MAC>[^;]+)
^(?:[^;\n]*;){5}(?P<cs_hostname>[^\)]+) Technically after the 'pending' field there should be a 'hits' field (represented by the -1 above), but we don't use it, so I didn't bother extracting it. So my problem is the parentheses. If a filename shows up in the cs_uri_stem field that includes them, like filename(copy1).txt, the () throw off my jkversion and cs_hostname extractions, because I don't know how to accommodate the possible existence of parentheses outside the cs_user_agent field. So I guess my question is two-fold. 1) I know my overall user-agent extraction should be a single transform instead of all separate field extractions, but I'm not sure how to tie them all together because I couldn't see a way to extract strings like that in the field extractor interface in Splunk. 2) How can I fix my regex so that parentheses appearing in other fields don't break my jkversion and cs_hostname extractions? Help?
... View more