- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a lookup with url like
url |
.url.com |
site.url.com |
And i try to match it with my proxy logs to check if users access it.
But i have issues with ".url.com" since it don't exactly matches the hostname. I have tried to replace them with "*.url.com" but splunk lookup don't match wildcard.
I have tried things like this but nothing worked :
| inputlookup all_url.csv
| rename url as lookup_url
| join type=inner [ search index=my-proxy | eval lookup_url="*" . lookup_url . "*" | search hostname=lookup_url ]
Do you have any idea ?
Thanks
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

But i have issues with ".url.com" since it don't exactly matches the hostname. I have tried to replace them with "*.url.com" but splunk lookup don't match wildcard.
This is not correct. As @andrew_nelson points out. The problem is that you are trying to use inputlookup when lookup is the logical solution. Once you define a lookup with WILDCARD(url), you do not need to add an additional field, however. (You may want to use case-insensitive match, too.) This is how you do it in Splunk Web:
Here, I name the lookup definition without .csv.
This is the search to count matches per url as defined in the lookup.
index=my-proxy
[inputlookup all_urls
| rename url as hostname ]
| lookup all_urls url as hostname output url as url
| stats count by url
This does effectively the same as Andrew's except you don't need to add a second column. You also do not need a where command because the inputlookup subsearch already does that.
I understand that your reason of using inputlookup is to print 0 if there is no match. So you add one more step:
| append
[inputlookup all_urls]
| stats values(count) as count by url
| fillnull count
Given the following events in index my-proxy (assuming field hostname is already extracted at search time and represents the destination in your proxy log):
_time | hostname |
1969-12-31 16:00:01 | abc.url2.com |
1969-12-31 16:00:02 | def.url1.com |
1969-12-31 16:00:03 | ghi.url2.com |
1969-12-31 16:00:04 | www.url1.com |
1969-12-31 16:00:05 | site.url2.com |
1969-12-31 16:00:06 | abc.url1.com |
1969-12-31 16:00:07 | def.url2.com |
1969-12-31 16:00:08 | ghi.url1.com |
1969-12-31 16:00:09 | www.url2.com |
1969-12-31 16:00:10 | site.url1.com |
1969-12-31 16:00:11 | abc.url2.com |
1969-12-31 16:00:12 | def.url1.com |
1969-12-31 16:00:13 | ghi.url2.com |
1969-12-31 16:00:14 | www.url1.com |
1969-12-31 16:00:15 | site.url2.com |
the above search should give you
url | count |
*.url2.com | 8 |
site.url3.com | 0 |
www.url1.com | 2 |
Here is an emulation for you to play with and compare with real data
| makeresults count=15
| streamstats count as _time
| eval _domain = json_object(1, "abc", 2, "def", 3, "ghi", 4, "www", 0, "site")
| eval hostname = json_extract(_domain, tostring(_time % 5)) . ".url" . (_time % 2 + 1) . ".com"
``` the above emulates
index=my-proxy
[inputlookup all_urls
| rename url as hostname ]
```
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

But i have issues with ".url.com" since it don't exactly matches the hostname. I have tried to replace them with "*.url.com" but splunk lookup don't match wildcard.
This is not correct. As @andrew_nelson points out. The problem is that you are trying to use inputlookup when lookup is the logical solution. Once you define a lookup with WILDCARD(url), you do not need to add an additional field, however. (You may want to use case-insensitive match, too.) This is how you do it in Splunk Web:
Here, I name the lookup definition without .csv.
This is the search to count matches per url as defined in the lookup.
index=my-proxy
[inputlookup all_urls
| rename url as hostname ]
| lookup all_urls url as hostname output url as url
| stats count by url
This does effectively the same as Andrew's except you don't need to add a second column. You also do not need a where command because the inputlookup subsearch already does that.
I understand that your reason of using inputlookup is to print 0 if there is no match. So you add one more step:
| append
[inputlookup all_urls]
| stats values(count) as count by url
| fillnull count
Given the following events in index my-proxy (assuming field hostname is already extracted at search time and represents the destination in your proxy log):
_time | hostname |
1969-12-31 16:00:01 | abc.url2.com |
1969-12-31 16:00:02 | def.url1.com |
1969-12-31 16:00:03 | ghi.url2.com |
1969-12-31 16:00:04 | www.url1.com |
1969-12-31 16:00:05 | site.url2.com |
1969-12-31 16:00:06 | abc.url1.com |
1969-12-31 16:00:07 | def.url2.com |
1969-12-31 16:00:08 | ghi.url1.com |
1969-12-31 16:00:09 | www.url2.com |
1969-12-31 16:00:10 | site.url1.com |
1969-12-31 16:00:11 | abc.url2.com |
1969-12-31 16:00:12 | def.url1.com |
1969-12-31 16:00:13 | ghi.url2.com |
1969-12-31 16:00:14 | www.url1.com |
1969-12-31 16:00:15 | site.url2.com |
the above search should give you
url | count |
*.url2.com | 8 |
site.url3.com | 0 |
www.url1.com | 2 |
Here is an emulation for you to play with and compare with real data
| makeresults count=15
| streamstats count as _time
| eval _domain = json_object(1, "abc", 2, "def", 3, "ghi", 4, "www", 0, "site")
| eval hostname = json_extract(_domain, tostring(_time % 5)) . ".url" . (_time % 2 + 1) . ".com"
``` the above emulates
index=my-proxy
[inputlookup all_urls
| rename url as hostname ]
```
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks you for your explanation, it work very well.
KR
Théo
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @gcusello,
Thanks for your answers, I have tried this the issues is that I want too output the number of hit on domain in the lookup and if I do a raw search it will also catch Referer or other field.
KR
Théo
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you create a lookup definition, you can enable wildcard lookups.
I'd also make the proxy logs the primary search and use the lookup as search query parameters, have a second field in your CSV for inScope and mark them all as yes and then do the lookup with the definition wildcard.
url,inScope
url1.com,Yes
*.url2.com,Yes
site.url3.com,Yes
The lookup definition (url_list) will have WILDCARD(url) in the Match Type.
The search would look like :
index=my-proxy
[| inputlookup url_list
| rename url as query
| fields query]
| lookup url_list url
| where match(inScope, "Yes")
This takes all the values you have in your url CSV and uses them as filters on your proxy logs.
Then you lookup the urls in the logs to make sure they're exact matches to your list and not just other urls that might contain your urls somewhere else in the log.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


Hi @y4m373 ,
please try this:
index=my-proxy [ | inputlookup all_url.csv | rename url AS query | fields query ]
| ...
in this way, you use the values in the lookup's url field in a full text search on the events raw data.
Ciao.
Giuseppe
