Splunk Search

How do I extract a field from a URL and group by that field?

maddy1011
Explorer

How do I group data and get a count for usage per customer? My data is Time and Event. The event data is a URL and the customer name is somewhere in the URL. How do I group by customer to get a count per customer?
It's something like this and customer_name is what I want to group by

Time Event
1/7/15 5:12:44.469 PM 7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name?

Model=iphone&language=ge&

0 Karma

vasanthmss
Motivator

Hi Maddy,

Sine your URL format pattern is not same so you need to use two regular expressions,

 | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

Explanation:

 | rex field=s "(?<customer_name>[^\/]*)\?" 

The above one helps to grab the customer name before query string, eg /../.../.../..././../.../customer_name?query

| rex field=s "(?<customer_name>[^\/]*)\/localization"

The second one helps to grab the customer name which is before localization. ( you have to add in-case if you find any pattern like localization)

Sample searches:

|stats c | eval s="7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name? 
Model=iphone?uage=ge&" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

|stats c | eval s="2 Jan 2015 14:57:45,121 RequestLogFilter INFO :end: /XYZ/api//dassets/customer_name2?deviceModel=iphone?uage=ge&pageSize=1000?reenSize=0640x1136&assetQuality=hq" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

|stats c | eval s="2 Jan 2015 15:18:29,913 RequestLogFilter INFO :end: /XYZ/api//dasset/customer_name3/localization?language=ge&deviceModel=iphone&assetQuality=hq&assetVersion=160?reenSize=0640x1136˜i=2" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

Hope This will helps you !!

Cheers!!!

Thanks,
V

V
0 Karma

vasanthmss
Motivator

Is that helped?

V
0 Karma

alemarzu
Motivator

Try this maddy and let me know if it works.

^(?:.*[\\\/])(?<customer_name>.*)(?:\?\sModel)
0 Karma

maddy1011
Explorer

This gives me an error"
Error in 'SearchParser': Missing a search command before '\'. Error at position '132' of search query

not sure which "\" it's missing.

0 Karma

javiergn
Super Champion

Assuming your URLs look like the one you mentioned:

yoursearch
| rex field=_raw "\/(?<customer_name>[^\?\/]+)\?"
| stats count by customer_name

maddy1011
Explorer

This works but it omits certain results. Can you help explain the expression "\/(?[^\?\/]+)\?"

0 Karma

javiergn
Super Champion

Sure. What the regex is doing:

Find forward slash but don't capture it (needs to be escaped): \/
Start a capturing group (parenthesis with label customer_name)
Find 1 or many characters (plus symbol) different (^) from forward slash or question mark (escape needed again): [^\?\/]+
Then find a question mark but do not capture this in your token (outside the parenthesis)

If you give me an example that is not being captured I can help you with the regex.

You can also use regex101.com to test everything. It's a very intuitive page.

maddy1011
Explorer

So I had to dig in a little bit and figured that the endpoint in the URL has different formats. And the one with // is not being captured. I was trying to see if I can list all unique endpoints, but still struggling.

Here are some more examples.

7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name?
Model=iphone&language=ge&

and
2 Jan 2015 14:57:45,121 RequestLogFilter INFO :end: /XYZ/api//dassets/customer_name2?deviceModel=iphone&language=ge&pageSize=1000&screenSize=0640x1136&assetQuality=hq

this is the one not being captured.
2 Jan 2015 15:18:29,913 RequestLogFilter INFO :end: /XYZ/api//dasset/customer_name3/localization?language=ge&deviceModel=iphone&assetQuality=hq&assetVersion=160&screenSize=0640x1136&api=2

0 Karma

javiergn
Super Champion

OK, if you know for sure that your customer name is going to be after the third block then you can try the following too:

yoursearch
 | rex field=_raw ":end: (?:\/\/?[^\/]+){3}\/(?<customer_name>[^\?\/]+)"
 | stats count by customer_name

See the following link that used to test this regex:

https://regex101.com/r/tE9xQ9/1

Hope that helps.

Thanks,
J

0 Karma
Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...