Splunk Search

Get count from multiple urls based on required properties

arjun_krishna
Explorer

I am having below content with different (4 sets)urls presented in my logs, having index="abc_uyt"

RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/getbilledvspaid/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/paymenthistory/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/requesthistory/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/runninghistory/v1

RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicedetail/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicesummary/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/gettingValue/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/historyValue/v1

RuntimeException having https://microsoft.word.com/ringert/rkj3/obama/funatwork
RuntimeException having https://microsoft.word.com/ringert/rkj3/obama/runathome

RuntimeException having https://cisco-services.raj.com/ytr-services/gilchrist/vision
RuntimeException having https://cisco-services.raj.com/ytr-services/gilchrist/health

and i want to get the count based on ronaldo, watson, obama, gilchrist with appropriate values also as tabular form like below
ronaldo - 25
watson - 22
obama - 36
gilchrist - 21

Could any one please assist, i have tried with rex, sed, count.. but getting unexpected count

0 Karma
1 Solution

somesoni2
Revered Legend

If the URL domains are fixed, you try like this

index="abc_uyt"
| rex field=UrlFieldName "https:\/\/(google([^\/]+\/){5}|microsoft([^\/]+\/){3}|cisco([^\/]+\/){2})(?<name>[^\/]+)"
| stats count by name

See the regex working with your sample data here: https://regex101.com/r/t8coTo/1

View solution in original post

somesoni2
Revered Legend

If the URL domains are fixed, you try like this

index="abc_uyt"
| rex field=UrlFieldName "https:\/\/(google([^\/]+\/){5}|microsoft([^\/]+\/){3}|cisco([^\/]+\/){2})(?<name>[^\/]+)"
| stats count by name

See the regex working with your sample data here: https://regex101.com/r/t8coTo/1

arjun_krishna
Explorer

can you please consider above scenario ? please the syntax is almost correct not getting name based count

0 Karma

somesoni2
Revered Legend

Above regex works with your new data samples as well. https://regex101.com/r/BCtKTw/1

In my query, I'm assuming there is a URL field which contains these logs or the URL portion of it. If there is no such field and you're searching though your whole log entry or _raw field, just remove field=UrlFieldName from above query.

0 Karma

arjun_krishna
Explorer

Thanks, its worked

0 Karma

elliotproebstel
Champion

If the URLs will always end with either /something OR /something/v1 (where the "v1" will literally always be "v1" and not anything else), then this should work:

| rex field=_raw "(?<name>\w+)\/\w+(\/v1)?$"
| stats count by name
0 Karma

elliotproebstel
Champion

Alternately, if you have a finite list of names you're looking for, you could create a wildcard lookup containing those names. Here's a good answer that explains how to do that:
https://answers.splunk.com/answers/52580/can-we-use-wildcard-characters-in-a-lookup-table.html
I'll assume you load those in such that you wind up with something like this:

user, username
*ronaldo*, ronaldo
*watson*, watson
*obama*, obama
*gilchrist*, gilchrist

Once you have the names loaded into your wildcard lookup, you would do something like this:

your base search where the URLs are in a field called URL
| lookup your_wildcard_lookup user AS URL OUTPUT username
| stats count by username
0 Karma

arjun_krishna
Explorer

logs are comes like below sets log1, log2, log3 , log4, log5, log6
log1: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/getbilledvspaid/v1: Read timed out

log2: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/saysfs/v1: Read timed out

log3: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicesummary/v1: Read timed out

log4: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/iuaxaddd/v1: Read timed out

log5: KHGM PDF invoice service at endpoint: https://microsoft.word.com/ringert/rkj3/obama/funatwork

log6: and setting service endpoint URL: https://cisco-services.raj.com/ytr-services/gilchrist/health

0 Karma

somesoni2
Revered Legend

For the first set, will the URLs always ends with v1 ?

0 Karma

arjun_krishna
Explorer

No, they are not end with v1, rather than i have to depend on the url domains

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The field you are looking for seems to be in different places in the URLs. What determines the where the field is located?

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...