Splunk Search

How do you extract the URL into a separate field?

SplunkMasterSne
Explorer

Hello,

I'm trying to extract the URL from the message field, so I can create a separate field called URLs. At the moment, all our logs are in the message field, so extracting various parts is essential.

Can anyone help with the extraction of the URL, for example our log below? I need to extract www.bbc.co.uk:

2019-02-09 23:19:43 "proxyname" 20760 10.10.10.10 e1000000 proxy\proxy-rule-HTTP-Allowed - OBSERVED "News/Media" - 200 TCP_TUNNELED CONNECT - tcp www.bbc.co.uk 443 / - - "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" 15.15.15.15 77608 1760 - "BBC" "none"

Thank you!

0 Karma
1 Solution

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

View solution in original post

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

SplunkMasterSne
Explorer

Thank you for the help, works perfectly

0 Karma

FrankVl
Ultra Champion

What type of proxy is this and have you checked whether there is an add-on available for this already, that does the field extraction for you?

This looks like a slight variation on the W3C or squid log formats?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...