Splunk Search

How to match a list of URL strings from a CSV file against indexed data if there is no extracted URL field in my events?

Communicator

Against my events, I am trying to match a long list (2000 records) of malicious URL strings (e.g., hereisavirus.com) stored in a CSV file. One caveat - I do not have a "field" for URL in my events, so I am not able to use inputlookup and cross directly with a generated field.

Is there simple way to search the whole event in Splunk using a CSV file?

Thank you.

0 Karma
1 Solution

Legend

You could extract the URL into a field and then use (in)lookup to compare. Here is a very generic way you could extract the URL into a field

your base search | rex field=_raw "(?<URL>https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})" | lookup viruslist.csv URL AS URL OUTPUT someotherfield

This is not guaranteed to catch ALL URL patterns. Will need to see sample events to improve the probability of a match

View solution in original post

0 Karma

Legend

You could extract the URL into a field and then use (in)lookup to compare. Here is a very generic way you could extract the URL into a field

your base search | rex field=_raw "(?<URL>https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})" | lookup viruslist.csv URL AS URL OUTPUT someotherfield

This is not guaranteed to catch ALL URL patterns. Will need to see sample events to improve the probability of a match

View solution in original post

0 Karma

Communicator

Thank you, sundareshr.

So, I had created a custom Field extraction using the wizard:

^[^/\n]*/\d+\s+\d+\s+\w+\s+(?P[^ ]+)

When I run my base search, the field shows up.

I can also list my lookup table with the following command:

| inputlookup CCIC_URL.csv | rename Bad_URLs as destination_url | fields + destination_url

However, when I put them together using this search string:

base search | [| inputlookup CCIC_URL.csv | rename Bad_URLs as destination_url | fields + destination_url] | table _time, destination_url

I get the following error:

Redex: invalid UTF-8 string

The search job has failed due to an error.

Any thoughts on this issue?

0 Karma

Communicator

Nevermind - figured it out. My data had characters that weren't translating correctly, when inputlookup looks for literals.

0 Karma