Splunk Search

Proper reg-ex to extracts cisco_ironport_web.log fields like - user, domain and url

jagadeeshm
Contributor

cisco_ironport_web.log has the following events -

Event - 1

1489714117.601 56 27.1.11.11 TCP_REFRESH_HIT/200 54491 GET http://www.flatbed-scanner-review.org/inter-banner_flatbed.jpg bhussain@buttercupgames.com DIRECT/www.flatbed-scanner-review.org image/jpeg DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <nc,ns,0,-,-,-,-,0,-,-,-,-,-,-,-,nc,-> - http://www.flatbed-scanner-review.org/

Event - 2

1489713615.376 809 211.166.11.101 TCP_MISS/200 147639 GET http://www.vindy.com/ myuan@buttercupgames.com DIRECT/www.vindy.com text/html DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <IW_news,3.4,0,-,-,-,-,0,-,-,-,-,-,-,-,IW_news,-> - -

I use the following reg-ex to extract user, url and domain

"field1","field2","field3","field4","field5","field6","url","user","field9","field10","field11","field12","field13","domain"

It doesn't work for second event, because domain fields has '-'. How do I fix it?

Tags (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

View solution in original post

0 Karma

FrankVl
Ultra Champion

Instead of re-inventing the wheel, you could take some inspiration from Splunk Add-on for Cisco WSA
https://splunkbase.splunk.com/app/1747/

If I look at the sample data and props/transforms in that TA it seems to support very similar data to what you have. The regex in there does not perfectly match (the part between <...> is giving some issues I think), but might be a good start.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

0 Karma

jagadeeshm
Contributor

It doesn't actually extract domain name, which is my core issue.

0 Karma

cradeke_splunk
Splunk Employee
Splunk Employee

I tried the regex101 link, it extracts the domain field at the very end. That field is not always populated So I tried to extract the domain from the string right after "DIRECT/". This would be my solution. But only if you are not looking at the field at the end.

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]){6}\/(?<domain>[^ ]*)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
sorry correct 5 instead 4 (see https://regex101.com/r/1qW58r/2)

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){5}(?<domain>[^ ]*)

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...