Knowledge Management

regex help with field extraction

DaClyde
Contributor

I could use some expert assistance with a regex for breaking down a custom user-agent field in an IIS log into component fields while avoiding a conflict with other fields. 

We run software that uses IIS as a file server, and the software injects a custom user-agent value into the IIS log with every request.  Here is a sample of the user agent:

 

JTDI+(JDMS+1.0.11.2.20200807;+Win10+10.0;+229.0/62/-1;Branch|UnitType|System|City|ST|SiteIDOverride|SvrType|2.5;C8F7504F064E;UTA-AVD)

 

The IIS log is space delimited, so all of that lands in the cs_user_agent field just fine.  I made a sort of running mess of extracting the subfields.  Within the string are subfields delimited by semicolon, and sub-subfields delimited by / and |. 

Here are my separate extractions, in order as the fields appear in the string:

 

^[^\(\n]*JTDI\+\((?P<jkversion>[^;]+)
^[^;\n]*;(?P<os>[^;]+)
^(?:[^;\n]*;){2}(?P<freespace>[^/]+)
^(?:[^;\n]*;){2}\+\d+\.\d+/(?P<pending>\d+)
^(?:[^;\n]*;){3}(?P<SiteDescription>[^;]+)
^(?:[^;\n]*;){4}(?P<MAC>[^;]+)
^(?:[^;\n]*;){5}(?P<cs_hostname>[^\)]+)

 

Technically after the 'pending' field there should be a 'hits' field (represented by the -1 above), but we don't use it, so I didn't bother extracting it.

So my problem is the parentheses.  If a filename shows up in the cs_uri_stem field that includes them, like filename(copy1).txt, the () throw off my jkversion and cs_hostname extractions, because I don't know how to accommodate the possible existence of parentheses outside the cs_user_agent field.

So I guess my question is two-fold.

1)  I know my overall user-agent extraction should be a single transform instead of all separate field extractions, but I'm not sure how to tie them all together because I couldn't see a way to extract strings like that in the field extractor interface in Splunk.

2) How can I fix my regex so that parentheses appearing in other fields don't break my jkversion and cs_hostname extractions?

Help?

Labels (1)
0 Karma
1 Solution

scelikok
SplunkTrust
SplunkTrust

Below regex is working on all sample events;

JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
If this reply helps you an upvote and "Accept as Solution" is appreciated.

View solution in original post

scelikok
SplunkTrust
SplunkTrust

Below regex is working on all sample events;

JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
If this reply helps you an upvote and "Accept as Solution" is appreciated.

DaClyde
Contributor

Beautiful, this works perfectly.  Thank you!

0 Karma

scelikok
SplunkTrust
SplunkTrust

Hi @DaClyde,

You can use below one line regex if you can send a few samples which creates problem I can make change to capture them also;

^[^\(\n]*JTDI\+\((?P<jkversion>[^;]+);(?P<os>[^;]+);(?P<freespace>[^\/]+)\/(?P<pending>[^\/]+)\/(?P<hits>[^;]+);(?P<SiteDescription>[^;]+);(?P<MAC>[^;]+);(?P<cs_hostname>[^\)]+)
If this reply helps you an upvote and "Accept as Solution" is appreciated.

DaClyde
Contributor

So far, so good.  I implemented the combined field extractions, and it is at least working as good as before.  Here are some sample events, a few that are normal, and a few that cause problems:

 

2021-03-16 21:01:40 10.2.3.4 HEAD /configs/army/ACD-ABNGAASN1/configs.xml - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/0/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 304 0 0 325 393 192.168.81.70 close - -
2021-03-16 17:02:54 10.2.3.4 GET /Army/acd/faq/ACN+v1.1.6.2002(20R2)+Release+Updates.pdf - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.5/285/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 5905224 399 2984 192.168.81.70 close - -
2021-03-15 17:11:08 10.2.3.4 GET /Army/lift/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).pdf - 443 usr-1125 169.31.11.140 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/104/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 274234 408 15 192.168.81.70 close - -
2021-03-15 17:11:06 10.2.3.4 GET /Army/lift/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).docx - 443 usr-1125 169.31.11.143 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/105/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 51828 409 0 192.168.81.70 close - -
2021-03-15 17:07:05 10.2.3.4 GET /Army/utility/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).pdf - 443 usr-1125 169.31.11.141 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/205/-1;West|AA|ACD|Hotelville|AL|ACN-ABNGAASN1|ACDSVR|2.5;C8F7504F05CE;NGALA-ACN-02) - 200 0 0 274234 406 15 192.168.81.70 close - -
2021-03-15 17:07:02 10.2.3.4 GET /Army/utility/safety/AMAM/2020/GEN-20-AMAM-06+(2ND+UPDATE).docx - 443 usr-1125 169.31.11.142 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.6/206/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGALA-ACN-02) - 200 0 0 51828 407 15 192.168.81.70 close - -
2021-03-15 17:03:00 10.2.3.4 GET /Army/gen/safety/AMAM/2020/GEN-20-AMAM-06.pdf - 443 usr-1125 169.31.11.140 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.7/285/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 274234 405 0 192.168.81.70 close - -
2021-03-15 17:02:57 10.2.3.4 GET /Army/gen/safety/AMAM/2020/GEN-20-AMAM-06.docx - 443 usr-1125 169.31.11.143 JTDI+(JDMS+1.0.11.2.20200807;+WinServer+2016+10.0;+345.7/286/-1;West|AA|ACD|Hotelville|AL|ACD-ABNGAASN2|ACDSVR|2.5;C8F7504F05CE;NGABA-ACN-05) - 200 0 0 51828 406 0 192.168.81.70 close - -

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...