Splunk Search

How to use regex to extract from _raw and return in table format?

DLT76
Path Finder

I have logs with data in two fields: _raw and _time. I want to search the _raw field for an IP in a specific pattern and return a URL the follows the IP. I'd like to see it in a table in one column named "url" and also show the date/time a second column using the contents of the _time field.

Here's an example of the data in _raw:

 

  [1.2.3.4 lookup] : http://www.dummy-url.com/ -- 

 

I'd like to use a query like the following which will look for a specified IP and return the URL that follows after the colon:

 

rex field=_raw "1.2.3.4 lookup\] \: (?<url>[\w\:\/\.\-]+)"

 

The datasource looks like this:

 

sourcetype="datasource.out"

 

Can you help me with a query that searches for the IP and returns the URL (from _raw) and date/time (from _time) in table format?

Thanks!

Labels (2)
0 Karma
1 Solution

DLT76
Path Finder

Update #3 (and solution):

I think I figured it out.  I added this to the end of the query:

 | where ipaddress != ""

And now my table shows only those rows where the IP address matches.

Thank you for the help!

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You appear to have everything you need except for the table command.  What do you get with this query?

index=foo sourcetype="datasource.out"
| rex field=_raw "1.2.3.4 lookup\] \: (?<url>[\w\:\/\.\-]+)"
| table _time url

 

---
If this reply helps you, Karma would be appreciated.

DLT76
Path Finder

It does return a table with the date/time in one column, but the url column is blank.  It appears to be returning a row for every row during the date range.  I know I have rows with the IP in the _raw field because I get back rows when I search my source for just the IP in quotes.  And the regex looks good.  From regex101:

Capture.PNG

Ideas?

0 Karma

DLT76
Path Finder

Update:  It does appear to return every row from the raw field (or at least many more than have the specific IP), but when I sorted on the empty url column, I found that there are some rows with data, but they're not all URLs.

0 Karma

DLT76
Path Finder

Update #2:

So when I add a field for the ip address and display it in the table and sort on that column, I find matching results (yay!), but I'm also getting tons of records that don't match.  Here's the new query:

 

sourcetype="datasource.out" | rex field=_raw "(?<ipaddress>1.2.3.4) lookup\] \: (?<url>[\w\:\/\.\-]+)" | table _time url ipaddress

 

Is there a way to update the query to exclude non-matches from the table?

Capture2.PNG

0 Karma

DLT76
Path Finder

Update #3 (and solution):

I think I figured it out.  I added this to the end of the query:

 | where ipaddress != ""

And now my table shows only those rows where the IP address matches.

Thank you for the help!

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Will this work?

sourcetype="datasource.out"
| rex field=_raw "1.2.3.4 lookup\] \: (?<url>[\w\:\/\.\-]+)"
| table url _time

DLT76
Path Finder

Partly works--see above reply.  Thanks for your help.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could make the match more specific

sourcetype="datasource.out"
| rex field=_raw "1.2.3.4 lookup\] \: (?<url>http[\w\:\/\.\-]+)"
| table url _time
0 Karma

DLT76
Path Finder

Good idea--I thought of that too, but the table still returns gazillions of records that don't match, and the url and ipaddress fields are blank.  I'd like to see in the table only records that have a matching IP (see reply above).  Thanks again!

0 Karma

DLT76
Path Finder

I think I figured it out.  See Update #3 above.  I appreciate the assist!

0 Karma
Get Updates on the Splunk Community!

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...

AppDynamics is now part of Splunk Ideas

Hello Splunkers, We have exciting news for you! AppDynamics has been added to the Splunk Ideas Portal. Which ...

Advanced Splunk Data Management Strategies

Join us on Wednesday, May 14, 2025, at 11 AM PDT / 2 PM EDT for an exclusive Tech Talk that delves into ...