Splunk Search

Proper reg-ex to extracts cisco_ironport_web.log fields like - user, domain and url

jagadeeshm
Contributor

cisco_ironport_web.log has the following events -

Event - 1

1489714117.601 56 27.1.11.11 TCP_REFRESH_HIT/200 54491 GET http://www.flatbed-scanner-review.org/inter-banner_flatbed.jpg bhussain@buttercupgames.com DIRECT/www.flatbed-scanner-review.org image/jpeg DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <nc,ns,0,-,-,-,-,0,-,-,-,-,-,-,-,nc,-> - http://www.flatbed-scanner-review.org/

Event - 2

1489713615.376 809 211.166.11.101 TCP_MISS/200 147639 GET http://www.vindy.com/ myuan@buttercupgames.com DIRECT/www.vindy.com text/html DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <IW_news,3.4,0,-,-,-,-,0,-,-,-,-,-,-,-,IW_news,-> - -

I use the following reg-ex to extract user, url and domain

"field1","field2","field3","field4","field5","field6","url","user","field9","field10","field11","field12","field13","domain"

It doesn't work for second event, because domain fields has '-'. How do I fix it?

Tags (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

View solution in original post

0 Karma

FrankVl
Ultra Champion

Instead of re-inventing the wheel, you could take some inspiration from Splunk Add-on for Cisco WSA
https://splunkbase.splunk.com/app/1747/

If I look at the sample data and props/transforms in that TA it seems to support very similar data to what you have. The regex in there does not perfectly match (the part between <...> is giving some issues I think), but might be a good start.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

0 Karma

jagadeeshm
Contributor

It doesn't actually extract domain name, which is my core issue.

0 Karma

cradeke_splunk
Splunk Employee
Splunk Employee

I tried the regex101 link, it extracts the domain field at the very end. That field is not always populated So I tried to extract the domain from the string right after "DIRECT/". This would be my solution. But only if you are not looking at the field at the end.

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]){6}\/(?<domain>[^ ]*)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
sorry correct 5 instead 4 (see https://regex101.com/r/1qW58r/2)

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){5}(?<domain>[^ ]*)

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Index This | When is October more than just the tenth month?

October 2025 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What’s New & Next in Splunk SOAR

 Security teams today are dealing with more alerts, more tools, and more pressure than ever.  Join us for an ...