I could use some expert assistance with a regex for breaking down a custom user-agent field in an IIS log into component fields while avoiding a conflict with other fields.
We run software that uses IIS as a file server, and the software injects a custom user-agent value into the IIS log with every request. Here is a sample of the user agent:
JTDI+(JDMS+1.0.11.2.20200807;+Win10+10.0;+229.0/62/-1;Branch|UnitType|System|City|ST|SiteIDOverride|SvrType|2.5;C8F7504F064E;UTA-AVD)
The IIS log is space delimited, so all of that lands in the cs_user_agent field just fine. I made a sort of running mess of extracting the subfields. Within the string are subfields delimited by semicolon, and sub-subfields delimited by / and |.
Here are my separate extractions, in order as the fields appear in the string:
^[^\(\n]*JTDI\+\((?P<jkversion>[^;]+)
^[^;\n]*;(?P<os>[^;]+)
^(?:[^;\n]*;){2}(?P<freespace>[^/]+)
^(?:[^;\n]*;){2}\+\d+\.\d+/(?P<pending>\d+)
^(?:[^;\n]*;){3}(?P<SiteDescription>[^;]+)
^(?:[^;\n]*;){4}(?P<MAC>[^;]+)
^(?:[^;\n]*;){5}(?P<cs_hostname>[^\)]+)
Technically after the 'pending' field there should be a 'hits' field (represented by the -1 above), but we don't use it, so I didn't bother extracting it.
So my problem is the parentheses. If a filename shows up in the cs_uri_stem field that includes them, like filename(copy1).txt, the () throw off my jkversion and cs_hostname extractions, because I don't know how to accommodate the possible existence of parentheses outside the cs_user_agent field.
So I guess my question is two-fold.
1) I know my overall user-agent extraction should be a single transform instead of all separate field extractions, but I'm not sure how to tie them all together because I couldn't see a way to extract strings like that in the field extractor interface in Splunk.
2) How can I fix my regex so that parentheses appearing in other fields don't break my jkversion and cs_hostname extractions?
Help?
Below regex is working on all sample events;
JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
Below regex is working on all sample events;
JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
Beautiful, this works perfectly. Thank you!
Hi @DaClyde,
You can use below one line regex if you can send a few samples which creates problem I can make change to capture them also;
^[^\(\n]*JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
So far, so good. I implemented the combined field extractions, and it is at least working as good as before. Here are some sample events, a few that are normal, and a few that cause problems:
2021-03-16 21:01:40 10.2.3.4 HEAD /configs/army/ACD-ABNGAASN1/configs.xml - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/0/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 304 0 0 325 393 192.168.81.70 close - -
2021-03-16 17:02:54 10.2.3.4 GET /Army/acd/faq/ACN+v1.1.6.2002(20R2)+Release+Updates.pdf - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.5/285/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 5905224 399 2984 192.168.81.70 close - -
2021-03-15 17:11:08 10.2.3.4 GET /Army/lift/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).pdf - 443 usr-1125 169.31.11.140 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/104/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 274234 408 15 192.168.81.70 close - -
2021-03-15 17:11:06 10.2.3.4 GET /Army/lift/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).docx - 443 usr-1125 169.31.11.143 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/105/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 51828 409 0 192.168.81.70 close - -
2021-03-15 17:07:05 10.2.3.4 GET /Army/utility/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).pdf - 443 usr-1125 169.31.11.141 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/205/-1;West|AA|ACD|Hotelville|AL|ACN-ABNGAASN1|ACDSVR|2.5;C8F7504F05CE;NGALA-ACN-02) - 200 0 0 274234 406 15 192.168.81.70 close - -
2021-03-15 17:07:02 10.2.3.4 GET /Army/utility/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).docx - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/206/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGALA-ACN-02) - 200 0 0 51828 407 15 192.168.81.70 close - -
2021-03-15 17:03:00 10.2.3.4 GET /Army/gen/safety/AMAM/2020/GEN-20-AMAM-06.pdf - 443 usr-1125 169.31.11.140 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.7/285/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 274234 405 0 192.168.81.70 close - -
2021-03-15 17:02:57 10.2.3.4 GET /Army/gen/safety/AMAM/2020/GEN-20-AMAM-06.docx - 443 usr-1125 169.31.11.143 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.7/286/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 51828 406 0 192.168.81.70 close - -