Splunk Search

How do you extract the URL into a separate field?

SplunkMasterSne
Explorer

Hello,

I'm trying to extract the URL from the message field, so I can create a separate field called URLs. At the moment, all our logs are in the message field, so extracting various parts is essential.

Can anyone help with the extraction of the URL, for example our log below? I need to extract www.bbc.co.uk:

2019-02-09 23:19:43 "proxyname" 20760 10.10.10.10 e1000000 proxy\proxy-rule-HTTP-Allowed - OBSERVED "News/Media" - 200 TCP_TUNNELED CONNECT - tcp www.bbc.co.uk 443 / - - "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" 15.15.15.15 77608 1760 - "BBC" "none"

Thank you!

0 Karma
1 Solution

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

View solution in original post

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

SplunkMasterSne
Explorer

Thank you for the help, works perfectly

0 Karma

FrankVl
Ultra Champion

What type of proxy is this and have you checked whether there is an add-on available for this already, that does the field extraction for you?

This looks like a slight variation on the W3C or squid log formats?

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...