Knowledge Management

How to extract a value from a field by ignoring some of the characters?

vigneshit
New Member

The first query I run is
index=sec_proxy_web sourcetype="bluecoat:proxysg:access:syslog" | top 10 url

I have web proxy log and a field url but in url it contains tcp, http and some ad blocker sites which i want to remove by entering those site.
for eg :
1. I need to remove the site edge-chat.facebook.com,
2. I just need to remove tcp or http or https if its present in that url

I tried few rex but i was not able to do it.
can you please help me how to get an output without this characters.

alt text

Tags (1)
0 Karma

vnravikumar
Champion

Hi

Try this, in the below, you can filter out unwanted url in where

| makeresults 
| eval url="tcp://edge-chat.facebook.com/##tcp://www.google.com/##tcp://wynk.in/" 
| makemv delim="##" url 
| mvexpand url 
| where NOT url like("%edge-chat%") 
| rex field=url ":\/\/(?P<site>\S+)\/"
0 Karma

vigneshit
New Member

This works but i cannot use it for realtime and for a particular user in drilldown later.

0 Karma

mydog8it
Builder

Exclude the site you do not wish to include in the results by setting "NOT url=tcp://edge-chat.facebook.com" and remove tcp:// by stripping it with replace "| replace tcp://* WITH * IN url"

index=sec_proxy_web sourcetype="bluecoat:proxysg:access:syslog" NOT url=tcp://edge-chat.facebook.com |replace tcp://* WITH * IN url
0 Karma

anthonymelita
Contributor

To filter a specific url from the results, just add a | where url!="tcp://edge-chat.facebook.com/"
There are multiple solutions for stripping tcp, http, https, etc...
example 1 using eval replace: | eval url=replace(url, "tcp://", "")

example 2 using rex to extract then drop old field and rename new (can't just extract and overwrite):
| rex field=url "([chpt]://)(?.*)"
| fields - url
| rename urla AS url

example 3 in case you want to reformat specific url only, combine multiple eval functions:
| eval url=if(url="tcp://edge-chat.facebook.com/", replace(url, "tcp://", ""), url)

0 Karma

vigneshit
New Member

EG 1 and 3 works fine
But eg2 i get
error : Error in 'rex' command: Encountered the following error while compiling the regex '([chpt]://)(?.*)': Regex: unrecognized character after (? or (?-.

I wanted to write a rex where it can ignore http https and tcp the above expression looks correct but i dont know why splunk throws above error.

Couldnt understand this "rex to extract then drop old field and rename new (can't just extract and overwrite):"

0 Karma

vigneshit
New Member

Hi I used where and replace command.
but it takes more time to run the query.
Is there any other way I can make this query still efficient:
index=sec_proxy_web user_work_country="in" user_bunit="rbei/bsd1" sourcetype="bluecoat:proxysg:access:syslog"
|where category!="Content Delivery Networks" | where category!="Chat (IM)/SMS"
|where category!="Web Ads/Analytics"
| where NOT url like("%edge-chat%")
|where NOT url like("%mtalk.google.com%")
|where NOT url like("%accounts.google%")
|where NOT url like("%doubleclick%")
|where NOT url like("%googlevideo.com%")
|where NOT url like("%clients%")
|where NOT url like("%googlesyndication%")
|where NOT url like("%adservice%")
|where NOT url like("%gstatic%")
|where NOT url like("%api%")
|where NOT url like("%gravatar%")
|where NOT url like("%youboranqs01%")
|replace tcp://* WITH * IN url
|replace http://* WITH * IN url
|replace https://* WITH * IN url
| stats count by url | sort -count

0 Karma

anthonymelita
Contributor

sorry, the board formatting removed the field extraction tag from my regex.

| rex field=url "([chpt]://)(?<urla>.*)" | fields - url | rename urla AS url

I was saying you already have the field named 'url' so you cannot simply specify the same field name in the rex command to overwrite the existing values. That's why I gave it name 'urla' and then dropped the old one and then renamed the new field to the 'url'

I forget that you use % instead of * for a where. I would assume all those wildcards are what's increasing the execution time.

0 Karma

anthonymelita
Contributor

It may be helpful to mention that |where does not accept wildcards. If you needed to wildcard then change to |search url!=*edge-chat.facebook.com instead. And you will want to put that before your top 10 command

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...