Splunk Search

Proper reg-ex to extracts cisco_ironport_web.log fields like - user, domain and url

Contributor

ciscoironportweb.log has the following events -

Event - 1

1489714117.601 56 27.1.11.11 TCP_REFRESH_HIT/200 54491 GET http://www.flatbed-scanner-review.org/inter-banner_flatbed.jpg bhussain@buttercupgames.com DIRECT/www.flatbed-scanner-review.org image/jpeg DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <nc,ns,0,-,-,-,-,0,-,-,-,-,-,-,-,nc,-> - http://www.flatbed-scanner-review.org/

Event - 2

1489713615.376 809 211.166.11.101 TCP_MISS/200 147639 GET http://www.vindy.com/ myuan@buttercupgames.com DIRECT/www.vindy.com text/html DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <IW_news,3.4,0,-,-,-,-,0,-,-,-,-,-,-,-,IW_news,-> - -

I use the following reg-ex to extract user, url and domain

"field1","field2","field3","field4","field5","field6","url","user","field9","field10","field11","field12","field13","domain"

It doesn't work for second event, because domain fields has '-'. How do I fix it?

Tags (2)
0 Karma
1 Solution

Legend

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

View solution in original post

0 Karma

Ultra Champion

Instead of re-inventing the wheel, you could take some inspiration from Splunk Add-on for Cisco WSA
https://splunkbase.splunk.com/app/1747/

If I look at the sample data and props/transforms in that TA it seems to support very similar data to what you have. The regex in there does not perfectly match (the part between <...> is giving some issues I think), but might be a good start.

0 Karma

Legend

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

View solution in original post

0 Karma

Contributor

It doesn't actually extract domain name, which is my core issue.

0 Karma

Splunk Employee
Splunk Employee

I tried the regex101 link, it extracts the domain field at the very end. That field is not always populated So I tried to extract the domain from the string right after "DIRECT/". This would be my solution. But only if you are not looking at the field at the end.

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]){6}\/(?<domain>[^ ]*)
0 Karma

Legend

Hi jagadeeshm,
sorry correct 5 instead 4 (see https://regex101.com/r/1qW58r/2)

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){5}(?<domain>[^ ]*)

Bye.
Giuseppe

0 Karma