Hello,
I am trying to use a lookup table to search against the URL field inside of the proxy logs. The use case is to find out if any users have been accessing any domains/URL's that is listed in the lookup file. I am trying to strip away all the extra characters and keep the url/domain that it is matching on.
This is what I have so far:
index=proxy sourcetype=proxy:syslog:proxy_web_policy[|inputlookup lookupfile_last_status |return 1000 $lookup_domain_name]|rex field=proxy_url_field "(?<lookup_domain_name>(\w+\.)+\w+)"|table name url status
Example:
www.google.com/complete/search?client=chrome-omni&gs_ri=chrome-ext-ansg&xssi=t&q=gist.github.com/mcaj-admin/18558a1ec6a782d2452f971e806230c6&oit=3&url=https://gist.github.com/mcaj-admin/18558a1ec6a782d2452f971e806230c6&pgcl=4&gs_rn=42&psi=VqC6Zso_KQ3pgOGL&sugkey=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw
is is matching on "gist.github.com/mcaj-admin/18558a1ec6a782d2452f971e806230c6"
How can I strip away everything but what it is matching on? I am trying to figure out how to use regex + the lookup
Thanks.
remove [#This will extract domain from url] from below search.
index=proxy sourcetype=proxy:syslog:proxy_web_policy
| stats count as event_count by url
| rex field=url "^(\w+:\/\/)?(?<domain>[^\/]+)" #This will extract domain from url
| append [|inputlookup lookupfile_last_status | stats count as lookup_count by lookup_domain_name | rename lookup_domain_name as domain ]
| stats values(*) as * by domain
| where isnotnull(event_count) AND isnotnull(lookup_count)