The first query I run is
index=sec_proxy_web sourcetype="bluecoat:proxysg:access:syslog" | top 10 url
I have web proxy log and a field url but in url it contains tcp, http and some ad blocker sites which i want to remove by entering those site.
for eg :
1. I need to remove the site edge-chat.facebook.com,
2. I just need to remove tcp or http or https if its present in that url
I tried few rex but i was not able to do it.
can you please help me how to get an output without this characters.
Hi
Try this, in the below, you can filter out unwanted url in where
| makeresults
| eval url="tcp://edge-chat.facebook.com/##tcp://www.google.com/##tcp://wynk.in/"
| makemv delim="##" url
| mvexpand url
| where NOT url like("%edge-chat%")
| rex field=url ":\/\/(?P<site>\S+)\/"
This works but i cannot use it for realtime and for a particular user in drilldown later.
Exclude the site you do not wish to include in the results by setting "NOT url=tcp://edge-chat.facebook.com" and remove tcp:// by stripping it with replace "| replace tcp://* WITH * IN url"
index=sec_proxy_web sourcetype="bluecoat:proxysg:access:syslog" NOT url=tcp://edge-chat.facebook.com |replace tcp://* WITH * IN url
To filter a specific url from the results, just add a | where url!="tcp://edge-chat.facebook.com/"
There are multiple solutions for stripping tcp, http, https, etc...
example 1 using eval replace: | eval url=replace(url, "tcp://", "")
example 2 using rex to extract then drop old field and rename new (can't just extract and overwrite):
| rex field=url "([chpt]://)(?.*)"
| fields - url
| rename urla AS url
example 3 in case you want to reformat specific url only, combine multiple eval functions:
| eval url=if(url="tcp://edge-chat.facebook.com/", replace(url, "tcp://", ""), url)
EG 1 and 3 works fine
But eg2 i get
error : Error in 'rex' command: Encountered the following error while compiling the regex '([chpt]://)(?.*)': Regex: unrecognized character after (? or (?-.
I wanted to write a rex where it can ignore http https and tcp the above expression looks correct but i dont know why splunk throws above error.
Couldnt understand this "rex to extract then drop old field and rename new (can't just extract and overwrite):"
Hi I used where and replace command.
but it takes more time to run the query.
Is there any other way I can make this query still efficient:
index=sec_proxy_web user_work_country="in" user_bunit="rbei/bsd1" sourcetype="bluecoat:proxysg:access:syslog"
|where category!="Content Delivery Networks" | where category!="Chat (IM)/SMS"
|where category!="Web Ads/Analytics"
| where NOT url like("%edge-chat%")
|where NOT url like("%mtalk.google.com%")
|where NOT url like("%accounts.google%")
|where NOT url like("%doubleclick%")
|where NOT url like("%googlevideo.com%")
|where NOT url like("%clients%")
|where NOT url like("%googlesyndication%")
|where NOT url like("%adservice%")
|where NOT url like("%gstatic%")
|where NOT url like("%api%")
|where NOT url like("%gravatar%")
|where NOT url like("%youboranqs01%")
|replace tcp://* WITH * IN url
|replace http://* WITH * IN url
|replace https://* WITH * IN url
| stats count by url | sort -count
sorry, the board formatting removed the field extraction tag from my regex.
| rex field=url "([chpt]://)(?<urla>.*)" | fields - url | rename urla AS url
I was saying you already have the field named 'url' so you cannot simply specify the same field name in the rex command to overwrite the existing values. That's why I gave it name 'urla' and then dropped the old one and then renamed the new field to the 'url'
I forget that you use % instead of * for a where. I would assume all those wildcards are what's increasing the execution time.
It may be helpful to mention that |where
does not accept wildcards. If you needed to wildcard then change to |search url!=*edge-chat.facebook.com
instead. And you will want to put that before your top 10 command