I'm trying to find IP addresses that hit a specific url and no other. I tried to use set diff
but it's not returning results I expect.
If this search gives the IP addresses of everyone who hit url_a, let's say this returns 447 results:
sourcetype=weblogs request="GET /url_a/ HTTP*" | dedup ip | table ip | sort ip
And this request gives the IP addresses of everyone who hit a url underneath there, let's say this returns 314 results:
sourcetype=weblogs | regex request="^GET /url_a/[0-9a-z].* HTTP.*" | dedup ip | table ip | sort ip
I'm trying to find the list of IPs in the first list that are not in the second. set diff
will also return items in the second search that aren't in the first, which is not what I want.
The other thing I tried was a subsearch like this:
sourcetype=weblogs request="GET /url_a/ HTTP*" NOT [ search sourcetype=weblogs | regex request="^GET /url_a/[0-9a-z].* HTTP.*" | dedup ip | table ip | sort ip] | dedup ip | table ip | sort ip
But this returns entries that are also in the second search, so it cannot be correct. Does anyone know of an effective way to do this?
Thanks!
maybe something like this?
but, do you really need the regex?
sourcetype=weblogs
request="GET /url_a/*"
| regex request="^GET /url_a/([0-9a-z].*)? HTTP"
| stats values(request) as requests by ip
| search requests="GET /url_a/ HTTP*"
| sort ip
maybe something like this?
but, do you really need the regex?
sourcetype=weblogs
request="GET /url_a/*"
| regex request="^GET /url_a/([0-9a-z].*)? HTTP"
| stats values(request) as requests by ip
| search requests="GET /url_a/ HTTP*"
| sort ip
I'm using the regex because request="GET /url_a/*
will include both the following urls:
GET /url_a/
GET /url_a/url_b/
and I only want it to return the second of those two entries. url_b
in this case could be one of a number of urls that start with a-z or 0-9
Your search is getting me closer. The stats values() piece seems to make a collection of urls for each IP, correct? The issue is that I'm still getting results that have multiple urls in their collection something like this:
ip requests
1.1.1.1
GET /url_a/ HTTP/1.1
GET /url_a/ad/ HTTP/1.1
GET /url_a/do/ HTTP/1.1
GET /url_a/ho/ HTTP/1.1
GET /url_a/ju/ HTTP/1.1
GET /url_a/of/ HTTP/1.1
1.1.1.2
GET /url_a/ HTTP/1.1
1.1.1.3
GET /url_a/ HTTP/1.1
GET /url_a/di/ HTTP/1.1
1.1.1.4
GET /url_a/ HTTP/1.1
1.1.1.5
GET /url_a/ HTTP/1.1
GET /url_a/al/ HTTP/1.1
GET /url_a/ba/ HTTP/1.1
GET /url_a/bu/ HTTP/1.1
GET /url_a/gr/ HTTP/1.1
GET /url_a/wh/ HTTP/1.1
1.1.1.6
GET /url_a/ HTTP/1.1
1.1.1.7
GET /url_a/ HTTP/1.0
GET /url_a/bl/ HTTP/1.0
Out of those results I really only want 1.1.1.2, 1.1.1.4 and 1.1.1.6
Aha, this helped me. And you're correct that the regex isn't needed in the code snippet you gave, but I did need it to do what I wanted. Here's the final form:
sourcetype=weblogs
request="GET /url_a/*"
| stats values(request) as requests by ip
| search requests="GET /url_a/ HTTP*" | regex requests!="^GET /url_a/[0-9a-z]"
| sort ip
Thanks for your help!
You might rather go like
| search requests="GET /url_a/ HTTP*" and mvcount(requests) == 1
cause regex costs a lot.