Splunk Search

How do you extract the URL into a separate field?

SplunkMasterSne
Explorer

Hello,

I'm trying to extract the URL from the message field, so I can create a separate field called URLs. At the moment, all our logs are in the message field, so extracting various parts is essential.

Can anyone help with the extraction of the URL, for example our log below? I need to extract www.bbc.co.uk:

2019-02-09 23:19:43 "proxyname" 20760 10.10.10.10 e1000000 proxy\proxy-rule-HTTP-Allowed - OBSERVED "News/Media" - 200 TCP_TUNNELED CONNECT - tcp www.bbc.co.uk 443 / - - "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" 15.15.15.15 77608 1760 - "BBC" "none"

Thank you!

0 Karma
1 Solution

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

View solution in original post

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

SplunkMasterSne
Explorer

Thank you for the help, works perfectly

0 Karma

FrankVl
Ultra Champion

What type of proxy is this and have you checked whether there is an add-on available for this already, that does the field extraction for you?

This looks like a slight variation on the W3C or squid log formats?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...