Splunk Search

Get count from multiple urls based on required properties

arjun_krishna
Explorer

I am having below content with different (4 sets)urls presented in my logs, having index="abc_uyt"

RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/getbilledvspaid/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/paymenthistory/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/requesthistory/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/runninghistory/v1

RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicedetail/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicesummary/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/gettingValue/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/historyValue/v1

RuntimeException having https://microsoft.word.com/ringert/rkj3/obama/funatwork
RuntimeException having https://microsoft.word.com/ringert/rkj3/obama/runathome

RuntimeException having https://cisco-services.raj.com/ytr-services/gilchrist/vision
RuntimeException having https://cisco-services.raj.com/ytr-services/gilchrist/health

and i want to get the count based on ronaldo, watson, obama, gilchrist with appropriate values also as tabular form like below
ronaldo - 25
watson - 22
obama - 36
gilchrist - 21

Could any one please assist, i have tried with rex, sed, count.. but getting unexpected count

0 Karma
1 Solution

somesoni2
Revered Legend

If the URL domains are fixed, you try like this

index="abc_uyt"
| rex field=UrlFieldName "https:\/\/(google([^\/]+\/){5}|microsoft([^\/]+\/){3}|cisco([^\/]+\/){2})(?<name>[^\/]+)"
| stats count by name

See the regex working with your sample data here: https://regex101.com/r/t8coTo/1

View solution in original post

somesoni2
Revered Legend

If the URL domains are fixed, you try like this

index="abc_uyt"
| rex field=UrlFieldName "https:\/\/(google([^\/]+\/){5}|microsoft([^\/]+\/){3}|cisco([^\/]+\/){2})(?<name>[^\/]+)"
| stats count by name

See the regex working with your sample data here: https://regex101.com/r/t8coTo/1

arjun_krishna
Explorer

can you please consider above scenario ? please the syntax is almost correct not getting name based count

0 Karma

somesoni2
Revered Legend

Above regex works with your new data samples as well. https://regex101.com/r/BCtKTw/1

In my query, I'm assuming there is a URL field which contains these logs or the URL portion of it. If there is no such field and you're searching though your whole log entry or _raw field, just remove field=UrlFieldName from above query.

0 Karma

arjun_krishna
Explorer

Thanks, its worked

0 Karma

elliotproebstel
Champion

If the URLs will always end with either /something OR /something/v1 (where the "v1" will literally always be "v1" and not anything else), then this should work:

| rex field=_raw "(?<name>\w+)\/\w+(\/v1)?$"
| stats count by name
0 Karma

elliotproebstel
Champion

Alternately, if you have a finite list of names you're looking for, you could create a wildcard lookup containing those names. Here's a good answer that explains how to do that:
https://answers.splunk.com/answers/52580/can-we-use-wildcard-characters-in-a-lookup-table.html
I'll assume you load those in such that you wind up with something like this:

user, username
*ronaldo*, ronaldo
*watson*, watson
*obama*, obama
*gilchrist*, gilchrist

Once you have the names loaded into your wildcard lookup, you would do something like this:

your base search where the URLs are in a field called URL
| lookup your_wildcard_lookup user AS URL OUTPUT username
| stats count by username
0 Karma

arjun_krishna
Explorer

logs are comes like below sets log1, log2, log3 , log4, log5, log6
log1: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/getbilledvspaid/v1: Read timed out

log2: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/saysfs/v1: Read timed out

log3: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicesummary/v1: Read timed out

log4: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/iuaxaddd/v1: Read timed out

log5: KHGM PDF invoice service at endpoint: https://microsoft.word.com/ringert/rkj3/obama/funatwork

log6: and setting service endpoint URL: https://cisco-services.raj.com/ytr-services/gilchrist/health

0 Karma

somesoni2
Revered Legend

For the first set, will the URLs always ends with v1 ?

0 Karma

arjun_krishna
Explorer

No, they are not end with v1, rather than i have to depend on the url domains

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The field you are looking for seems to be in different places in the URLs. What determines the where the field is located?

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...