Splunk Search

Get count from multiple urls based on required properties

arjun_krishna
Explorer

I am having below content with different (4 sets)urls presented in my logs, having index="abc_uyt"

RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/getbilledvspaid/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/paymenthistory/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/requesthistory/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/runninghistory/v1

RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicedetail/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicesummary/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/gettingValue/v1
RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/historyValue/v1

RuntimeException having https://microsoft.word.com/ringert/rkj3/obama/funatwork
RuntimeException having https://microsoft.word.com/ringert/rkj3/obama/runathome

RuntimeException having https://cisco-services.raj.com/ytr-services/gilchrist/vision
RuntimeException having https://cisco-services.raj.com/ytr-services/gilchrist/health

and i want to get the count based on ronaldo, watson, obama, gilchrist with appropriate values also as tabular form like below
ronaldo - 25
watson - 22
obama - 36
gilchrist - 21

Could any one please assist, i have tried with rex, sed, count.. but getting unexpected count

0 Karma
1 Solution

somesoni2
Revered Legend

If the URL domains are fixed, you try like this

index="abc_uyt"
| rex field=UrlFieldName "https:\/\/(google([^\/]+\/){5}|microsoft([^\/]+\/){3}|cisco([^\/]+\/){2})(?<name>[^\/]+)"
| stats count by name

See the regex working with your sample data here: https://regex101.com/r/t8coTo/1

View solution in original post

somesoni2
Revered Legend

If the URL domains are fixed, you try like this

index="abc_uyt"
| rex field=UrlFieldName "https:\/\/(google([^\/]+\/){5}|microsoft([^\/]+\/){3}|cisco([^\/]+\/){2})(?<name>[^\/]+)"
| stats count by name

See the regex working with your sample data here: https://regex101.com/r/t8coTo/1

arjun_krishna
Explorer

can you please consider above scenario ? please the syntax is almost correct not getting name based count

0 Karma

somesoni2
Revered Legend

Above regex works with your new data samples as well. https://regex101.com/r/BCtKTw/1

In my query, I'm assuming there is a URL field which contains these logs or the URL portion of it. If there is no such field and you're searching though your whole log entry or _raw field, just remove field=UrlFieldName from above query.

0 Karma

arjun_krishna
Explorer

Thanks, its worked

0 Karma

elliotproebstel
Champion

If the URLs will always end with either /something OR /something/v1 (where the "v1" will literally always be "v1" and not anything else), then this should work:

| rex field=_raw "(?<name>\w+)\/\w+(\/v1)?$"
| stats count by name
0 Karma

elliotproebstel
Champion

Alternately, if you have a finite list of names you're looking for, you could create a wildcard lookup containing those names. Here's a good answer that explains how to do that:
https://answers.splunk.com/answers/52580/can-we-use-wildcard-characters-in-a-lookup-table.html
I'll assume you load those in such that you wind up with something like this:

user, username
*ronaldo*, ronaldo
*watson*, watson
*obama*, obama
*gilchrist*, gilchrist

Once you have the names loaded into your wildcard lookup, you would do something like this:

your base search where the URLs are in a field called URL
| lookup your_wildcard_lookup user AS URL OUTPUT username
| stats count by username
0 Karma

arjun_krishna
Explorer

logs are comes like below sets log1, log2, log3 , log4, log5, log6
log1: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/getbilledvspaid/v1: Read timed out

log2: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/ronaldo/saysfs/v1: Read timed out

log3: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/invoicesummary/v1: Read timed out

log4: Caused by: java.RuntimeException having https://google.yahoo.com/web/kiran/cart/groups/watson/iuaxaddd/v1: Read timed out

log5: KHGM PDF invoice service at endpoint: https://microsoft.word.com/ringert/rkj3/obama/funatwork

log6: and setting service endpoint URL: https://cisco-services.raj.com/ytr-services/gilchrist/health

0 Karma

somesoni2
Revered Legend

For the first set, will the URLs always ends with v1 ?

0 Karma

arjun_krishna
Explorer

No, they are not end with v1, rather than i have to depend on the url domains

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The field you are looking for seems to be in different places in the URLs. What determines the where the field is located?

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Cloud Platform | Customer Change Announcement: Email Notification Will Be Available ...

The Notification Team is migrating our email service provider from Postmark to AWS Simple Email ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...