Knowledge Management

regex help with field extraction

DaClyde
Contributor

I could use some expert assistance with a regex for breaking down a custom user-agent field in an IIS log into component fields while avoiding a conflict with other fields. 

We run software that uses IIS as a file server, and the software injects a custom user-agent value into the IIS log with every request.  Here is a sample of the user agent:

 

JTDI+(JDMS+1.0.11.2.20200807;+Win10+10.0;+229.0/62/-1;Branch|UnitType|System|City|ST|SiteIDOverride|SvrType|2.5;C8F7504F064E;UTA-AVD)

 

The IIS log is space delimited, so all of that lands in the cs_user_agent field just fine.  I made a sort of running mess of extracting the subfields.  Within the string are subfields delimited by semicolon, and sub-subfields delimited by / and |. 

Here are my separate extractions, in order as the fields appear in the string:

 

^[^\(\n]*JTDI\+\((?P<jkversion>[^;]+)
^[^;\n]*;(?P<os>[^;]+)
^(?:[^;\n]*;){2}(?P<freespace>[^/]+)
^(?:[^;\n]*;){2}\+\d+\.\d+/(?P<pending>\d+)
^(?:[^;\n]*;){3}(?P<SiteDescription>[^;]+)
^(?:[^;\n]*;){4}(?P<MAC>[^;]+)
^(?:[^;\n]*;){5}(?P<cs_hostname>[^\)]+)

 

Technically after the 'pending' field there should be a 'hits' field (represented by the -1 above), but we don't use it, so I didn't bother extracting it.

So my problem is the parentheses.  If a filename shows up in the cs_uri_stem field that includes them, like filename(copy1).txt, the () throw off my jkversion and cs_hostname extractions, because I don't know how to accommodate the possible existence of parentheses outside the cs_user_agent field.

So I guess my question is two-fold.

1)  I know my overall user-agent extraction should be a single transform instead of all separate field extractions, but I'm not sure how to tie them all together because I couldn't see a way to extract strings like that in the field extractor interface in Splunk.

2) How can I fix my regex so that parentheses appearing in other fields don't break my jkversion and cs_hostname extractions?

Help?

Labels (1)
0 Karma
1 Solution

scelikok
Champion

Below regex is working on all sample events;

JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
If this reply helps you an upvote is appreciated.

View solution in original post

scelikok
Champion

Below regex is working on all sample events;

JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
If this reply helps you an upvote is appreciated.

View solution in original post

DaClyde
Contributor

Beautiful, this works perfectly.  Thank you!

0 Karma

scelikok
Champion

Hi @DaClyde,

You can use below one line regex if you can send a few samples which creates problem I can make change to capture them also;

^[^\(\n]*JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
If this reply helps you an upvote is appreciated.

DaClyde
Contributor

So far, so good.  I implemented the combined field extractions, and it is at least working as good as before.  Here are some sample events, a few that are normal, and a few that cause problems:

 

2021-03-16 21:01:40 10.2.3.4 HEAD /configs/army/ACD-ABNGAASN1/configs.xml - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/0/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 304 0 0 325 393 192.168.81.70 close - -
2021-03-16 17:02:54 10.2.3.4 GET /Army/acd/faq/ACN+v1.1.6.2002(20R2)+Release+Updates.pdf - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.5/285/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 5905224 399 2984 192.168.81.70 close - -
2021-03-15 17:11:08 10.2.3.4 GET /Army/lift/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).pdf - 443 usr-1125 169.31.11.140 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/104/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 274234 408 15 192.168.81.70 close - -
2021-03-15 17:11:06 10.2.3.4 GET /Army/lift/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).docx - 443 usr-1125 169.31.11.143 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/105/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 51828 409 0 192.168.81.70 close - -
2021-03-15 17:07:05 10.2.3.4 GET /Army/utility/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).pdf - 443 usr-1125 169.31.11.141 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/205/-1;West|AA|ACD|Hotelville|AL|ACN-ABNGAASN1|ACDSVR|2.5;C8F7504F05CE;NGALA-ACN-02) - 200 0 0 274234 406 15 192.168.81.70 close - -
2021-03-15 17:07:02 10.2.3.4 GET /Army/utility/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).docx - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/206/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGALA-ACN-02) - 200 0 0 51828 407 15 192.168.81.70 close - -
2021-03-15 17:03:00 10.2.3.4 GET /Army/gen/safety/AMAM/2020/GEN-20-AMAM-06.pdf - 443 usr-1125 169.31.11.140 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.7/285/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 274234 405 0 192.168.81.70 close - -
2021-03-15 17:02:57 10.2.3.4 GET /Army/gen/safety/AMAM/2020/GEN-20-AMAM-06.docx - 443 usr-1125 169.31.11.143 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.7/286/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 51828 406 0 192.168.81.70 close - -

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!