Splunk Search

Proper reg-ex to extracts cisco_ironport_web.log fields like - user, domain and url

jagadeeshm
Contributor

cisco_ironport_web.log has the following events -

Event - 1

1489714117.601 56 27.1.11.11 TCP_REFRESH_HIT/200 54491 GET http://www.flatbed-scanner-review.org/inter-banner_flatbed.jpg bhussain@buttercupgames.com DIRECT/www.flatbed-scanner-review.org image/jpeg DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <nc,ns,0,-,-,-,-,0,-,-,-,-,-,-,-,nc,-> - http://www.flatbed-scanner-review.org/

Event - 2

1489713615.376 809 211.166.11.101 TCP_MISS/200 147639 GET http://www.vindy.com/ myuan@buttercupgames.com DIRECT/www.vindy.com text/html DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <IW_news,3.4,0,-,-,-,-,0,-,-,-,-,-,-,-,IW_news,-> - -

I use the following reg-ex to extract user, url and domain

"field1","field2","field3","field4","field5","field6","url","user","field9","field10","field11","field12","field13","domain"

It doesn't work for second event, because domain fields has '-'. How do I fix it?

Tags (2)
0 Karma
1 Solution

gcusello
Legend

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

View solution in original post

0 Karma

FrankVl
Ultra Champion

Instead of re-inventing the wheel, you could take some inspiration from Splunk Add-on for Cisco WSA
https://splunkbase.splunk.com/app/1747/

If I look at the sample data and props/transforms in that TA it seems to support very similar data to what you have. The regex in there does not perfectly match (the part between <...> is giving some issues I think), but might be a good start.

0 Karma

gcusello
Legend

Hi jagadeeshm,
try

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)

you can test it at https://regex101.com/r/1qW58r/1

Bye.
Giuseppe

0 Karma

jagadeeshm
Contributor

It doesn't actually extract domain name, which is my core issue.

0 Karma

cradeke_splunk
Splunk Employee
Splunk Employee

I tried the regex101 link, it extracts the domain field at the very end. That field is not always populated So I tried to extract the domain from the string right after "DIRECT/". This would be my solution. But only if you are not looking at the field at the end.

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]){6}\/(?<domain>[^ ]*)
0 Karma

gcusello
Legend

Hi jagadeeshm,
sorry correct 5 instead 4 (see https://regex101.com/r/1qW58r/2)

(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){5}(?<domain>[^ ]*)

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...