I have logs with data in two fields: _raw and _time. I want to search the _raw field for an IP in a specific pattern and return a URL the follows the IP. I'd like to see it in a table in one column named "url" and also show the date/time a second column using the contents of the _time field.
Here's an example of the data in _raw:
[1.2.3.4 lookup] : http://www.dummy-url.com/ --
I'd like to use a query like the following which will look for a specified IP and return the URL that follows after the colon:
rex field=_raw "1.2.3.4 lookup\] \: (?<url>[\w\:\/\.\-]+)"
The datasource looks like this:
sourcetype="datasource.out"
Can you help me with a query that searches for the IP and returns the URL (from _raw) and date/time (from _time) in table format?
Thanks!
Update #3 (and solution):
I think I figured it out. I added this to the end of the query:
| where ipaddress != ""
And now my table shows only those rows where the IP address matches.
Thank you for the help!
You appear to have everything you need except for the table command. What do you get with this query?
index=foo sourcetype="datasource.out"
| rex field=_raw "1.2.3.4 lookup\] \: (?<url>[\w\:\/\.\-]+)"
| table _time url
It does return a table with the date/time in one column, but the url column is blank. It appears to be returning a row for every row during the date range. I know I have rows with the IP in the _raw field because I get back rows when I search my source for just the IP in quotes. And the regex looks good. From regex101:
Ideas?
Update: It does appear to return every row from the raw field (or at least many more than have the specific IP), but when I sorted on the empty url column, I found that there are some rows with data, but they're not all URLs.
Update #2:
So when I add a field for the ip address and display it in the table and sort on that column, I find matching results (yay!), but I'm also getting tons of records that don't match. Here's the new query:
sourcetype="datasource.out" | rex field=_raw "(?<ipaddress>1.2.3.4) lookup\] \: (?<url>[\w\:\/\.\-]+)" | table _time url ipaddress
Is there a way to update the query to exclude non-matches from the table?
Update #3 (and solution):
I think I figured it out. I added this to the end of the query:
| where ipaddress != ""
And now my table shows only those rows where the IP address matches.
Thank you for the help!
Will this work?
sourcetype="datasource.out"
| rex field=_raw "1.2.3.4 lookup\] \: (?<url>[\w\:\/\.\-]+)"
| table url _time
Partly works--see above reply. Thanks for your help.
You could make the match more specific
sourcetype="datasource.out"
| rex field=_raw "1.2.3.4 lookup\] \: (?<url>http[\w\:\/\.\-]+)"
| table url _time
Good idea--I thought of that too, but the table still returns gazillions of records that don't match, and the url and ipaddress fields are blank. I'd like to see in the table only records that have a matching IP (see reply above). Thanks again!
I think I figured it out. See Update #3 above. I appreciate the assist!