Splunk Search

How do you extract the URL into a separate field?

Hello,

I'm trying to extract the URL from the message field, so I can create a separate field called URLs. At the moment, all our logs are in the message field, so extracting various parts is essential.

Can anyone help with the extraction of the URL, for example our log below? I need to extract www.bbc.co.uk:

2019-02-09 23:19:43 "proxyname" 20760 10.10.10.10 e1000000 proxy\proxy-rule-HTTP-Allowed - OBSERVED "News/Media" - 200 TCP_TUNNELED CONNECT - tcp www.bbc.co.uk 443 / - - "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" 15.15.15.15 77608 1760 - "BBC" "none"

Thank you!

0 Karma
1 Solution

Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

View solution in original post

Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

View solution in original post

Thank you for the help, works perfectly

0 Karma

Ultra Champion

What type of proxy is this and have you checked whether there is an add-on available for this already, that does the field extraction for you?

This looks like a slight variation on the W3C or squid log formats?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!