Splunk Search

Proper reg-ex to extracts cisco_ironport_web.log fields like - user, domain and url

jagadeeshm
Contributor

cisco_ironport_web.log has the following events -

Event - 1

1489714117.601 56 27.1.11.11 TCP_REFRESH_HIT/200 54491 GET http://www.flatbed-scanner-review.org/inter-banner_flatbed.jpg bhussain@buttercupgames.com DIRECT/www.flatbed-scanner-review.org image/jpeg DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <nc,ns,0,-,-,-,-,0,-,-,-,-,-,-,-,nc,-> - http://www.flatbed-scanner-review.org/

Event - 2

1489713615.376 809 211.166.11.101 TCP_MISS/200 147639 GET http://www.vindy.com/ myuan@buttercupgames.com DIRECT/www.vindy.com text/html DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <IW_news,3.4,0,-,-,-,-,0,-,-,-,-,-,-,-,IW_news,-> - -

I use the following reg-ex to extract user, url and domain

"field1","field2","field3","field4","field5","field6","url","user","field9","field10","field11","field12","field13","domain"

It doesn't work for second event, because domain fields has '-'. How do I fix it?

Tags (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

View solution in original post

0 Karma

FrankVl
Ultra Champion

Instead of re-inventing the wheel, you could take some inspiration from Splunk Add-on for Cisco WSA
https://splunkbase.splunk.com/app/1747/

If I look at the sample data and props/transforms in that TA it seems to support very similar data to what you have. The regex in there does not perfectly match (the part between <...> is giving some issues I think), but might be a good start.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

0 Karma

jagadeeshm
Contributor

It doesn't actually extract domain name, which is my core issue.

0 Karma

cradeke_splunk
Splunk Employee
Splunk Employee

I tried the regex101 link, it extracts the domain field at the very end. That field is not always populated So I tried to extract the domain from the string right after "DIRECT/". This would be my solution. But only if you are not looking at the field at the end.

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]){6}\/(?<domain>[^ ]*)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jagadeeshm,
sorry correct 5 instead 4 (see https://regex101.com/r/1qW58r/2)

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){5}(?<domain>[^ ]*)

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

 Prepare to elevate your security operations with the powerful upgrade to Splunk Enterprise Security 8.x! This ...

Get Early Access to AI Playbook Authoring: Apply for the Alpha Private Preview ...

Passionate about security automation? Apply now to our AI Playbook Authoring Alpha private preview ...

Reduce and Transform Your Firewall Data with Splunk Data Management

Managing high-volume firewall data has always been a challenge. Noisy events and verbose traffic logs often ...