Splunk Search

check if url in a lookup match logs

y4m373
Explorer

Hello,

 

I have a lookup with url like 

url

www.url.com

.url.com
site.url.com

 

And i try to match it with my proxy logs to check if users access it.

But i have issues with ".url.com" since it don't exactly matches the hostname. I have tried to replace them with "*.url.com" but splunk lookup don't match wildcard.

I have tried things like this but nothing worked :

| inputlookup all_url.csv
| rename url as lookup_url
| join type=inner [ search index=my-proxy | eval lookup_url="*" . lookup_url . "*" | search hostname=lookup_url ]

Do you have any idea ?
Thanks

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust
But i have issues with ".url.com" since it don't exactly matches the hostname. I have tried to replace them with "*.url.com" but splunk lookup don't match wildcard.

This is not correct.  As @andrew_nelson points out.  The problem is that you are trying to use inputlookup when lookup is the logical solution.  Once you define a lookup with WILDCARD(url), you do not need to add an additional field, however. (You may want to use case-insensitive match, too.)  This is how you do it in Splunk Web:

all_urls-lookup.png

Here, I name the lookup definition without .csv.

This is the search to count matches per url as defined in the lookup.

 

index=my-proxy
    [inputlookup all_urls
    | rename url as hostname ]
| lookup all_urls url as hostname output url as url
| stats count by url

 

This does effectively the same as Andrew's except you don't need to add a second column.  You also do not need a where command because the inputlookup subsearch already does that.

I understand that your reason of using inputlookup is to print 0 if there is no match.  So you add one more step:

 

| append
    [inputlookup all_urls]
| stats values(count) as count by url
| fillnull count

 

Given the following events in index my-proxy (assuming field hostname is already extracted at search time and represents the destination in your proxy log):

_timehostname
1969-12-31 16:00:01abc.url2.com
1969-12-31 16:00:02def.url1.com
1969-12-31 16:00:03ghi.url2.com
1969-12-31 16:00:04www.url1.com
1969-12-31 16:00:05site.url2.com
1969-12-31 16:00:06abc.url1.com
1969-12-31 16:00:07def.url2.com
1969-12-31 16:00:08ghi.url1.com
1969-12-31 16:00:09www.url2.com
1969-12-31 16:00:10site.url1.com
1969-12-31 16:00:11abc.url2.com
1969-12-31 16:00:12def.url1.com
1969-12-31 16:00:13ghi.url2.com
1969-12-31 16:00:14www.url1.com
1969-12-31 16:00:15site.url2.com

the above search should give you

urlcount
*.url2.com8
site.url3.com0
www.url1.com2

Here is an emulation for you to play with and compare with real data

 

| makeresults count=15
| streamstats count as _time
| eval _domain = json_object(1, "abc", 2, "def", 3, "ghi", 4, "www", 0, "site")
| eval hostname = json_extract(_domain, tostring(_time % 5)) . ".url" . (_time % 2 + 1) . ".com"
``` the above emulates
index=my-proxy
    [inputlookup all_urls
    | rename url as hostname ]
```

 

 

 

View solution in original post

yuanliu
SplunkTrust
SplunkTrust
But i have issues with ".url.com" since it don't exactly matches the hostname. I have tried to replace them with "*.url.com" but splunk lookup don't match wildcard.

This is not correct.  As @andrew_nelson points out.  The problem is that you are trying to use inputlookup when lookup is the logical solution.  Once you define a lookup with WILDCARD(url), you do not need to add an additional field, however. (You may want to use case-insensitive match, too.)  This is how you do it in Splunk Web:

all_urls-lookup.png

Here, I name the lookup definition without .csv.

This is the search to count matches per url as defined in the lookup.

 

index=my-proxy
    [inputlookup all_urls
    | rename url as hostname ]
| lookup all_urls url as hostname output url as url
| stats count by url

 

This does effectively the same as Andrew's except you don't need to add a second column.  You also do not need a where command because the inputlookup subsearch already does that.

I understand that your reason of using inputlookup is to print 0 if there is no match.  So you add one more step:

 

| append
    [inputlookup all_urls]
| stats values(count) as count by url
| fillnull count

 

Given the following events in index my-proxy (assuming field hostname is already extracted at search time and represents the destination in your proxy log):

_timehostname
1969-12-31 16:00:01abc.url2.com
1969-12-31 16:00:02def.url1.com
1969-12-31 16:00:03ghi.url2.com
1969-12-31 16:00:04www.url1.com
1969-12-31 16:00:05site.url2.com
1969-12-31 16:00:06abc.url1.com
1969-12-31 16:00:07def.url2.com
1969-12-31 16:00:08ghi.url1.com
1969-12-31 16:00:09www.url2.com
1969-12-31 16:00:10site.url1.com
1969-12-31 16:00:11abc.url2.com
1969-12-31 16:00:12def.url1.com
1969-12-31 16:00:13ghi.url2.com
1969-12-31 16:00:14www.url1.com
1969-12-31 16:00:15site.url2.com

the above search should give you

urlcount
*.url2.com8
site.url3.com0
www.url1.com2

Here is an emulation for you to play with and compare with real data

 

| makeresults count=15
| streamstats count as _time
| eval _domain = json_object(1, "abc", 2, "def", 3, "ghi", 4, "www", 0, "site")
| eval hostname = json_extract(_domain, tostring(_time % 5)) . ".url" . (_time % 2 + 1) . ".com"
``` the above emulates
index=my-proxy
    [inputlookup all_urls
    | rename url as hostname ]
```

 

 

 

y4m373
Explorer

Thanks you for your explanation, it work very well.

 

KR

Théo

0 Karma

y4m373
Explorer

Hello @gcusello,

 

Thanks for your answers, I have tried this the issues is that I want too output the number of hit on domain in the lookup and if I do a raw search it will also catch Referer or other field.

 

KR
Théo

0 Karma

andrew_nelson
Communicator

If you create a lookup definition, you can enable wildcard lookups.
I'd also make the proxy logs the primary search and use the lookup as search query parameters, have a second field in your CSV for inScope and mark them all as yes and then do the lookup with the definition wildcard. 

 

url,inScope
url1.com,Yes
*.url2.com,Yes
site.url3.com,Yes

 

The lookup definition (url_list) will have WILDCARD(url) in the Match Type. 
The search would look like :

 

index=my-proxy
    [| inputlookup url_list 
    | rename url as query 
    | fields query]
| lookup url_list url
| where match(inScope, "Yes")

 

This takes all the values you have in your url CSV and uses them as filters on your proxy logs. 
Then you lookup the urls in the logs to make sure they're exact matches to your list and not just other urls that might contain your urls somewhere else in the log. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @y4m373 ,

please try this:

index=my-proxy [ | inputlookup all_url.csv | rename url AS query | fields query ]
| ...

in this way, you use the values in the lookup's url field in a full text search on the events raw data.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL | Key Use Cases to Unlock the Power of SPL

Splunk AI Assistant for SPL | Key Use Cases to Unlock the Power of SPL  The Splunk AI Assistant for SPL ...

Buttercup Games: Further Dashboarding Techniques (Part 5)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...

Customers Increasingly Choose Splunk for Observability

For the second year in a row, Splunk was recognized as a Leader in the 2024 Gartner® Magic Quadrant™ for ...