Solved: How to write a regex to group based on a particula...

sp1711 · ‎06-01-2015

I am looking to see how many times a particular uri was hit on a daily basis and group it based on a field.
say the uri is POST {base_url}/user/{user_id}/def/{def_id}/xyz

I have done the first part of how many times this uri is hit daily,

index="something" sourcetype=blah OR meh "def"| stats count by uri | bucket _time span=1d | time chart count

Now I want to group this based on different user_id's.

                  user_id1    user_id2         user_id3
day1             10                20                 2
day2             21                22                 50
day3             20                30                 10

I'm looking for this kind of an output. Any ideas?

rsennett_splunk · ‎06-01-2015

Basically what you want to do is create a field that contains the userid so you can group by it...

POST\s+\/user\/(?<user>[^\/]+)

will create the userid field for you.

if you want to grab the whole thing (and maybe create a field for def_id) just use the slash to jump from segment to segment so that there can be anything between them. Or... if what's between them is static... then use literals:

POST\s+\/user\/(?<userid>[^\/]+)\/[^\/]+\/(?<defid>[^\/]+)\/\S+

makes two fields should you need them.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

jacobwilkins · ‎06-02-2015

One thing to keep in mind is that extracting the field via a regex is a totally separate step from grouping an aggregated result.

index="something" sourcetype=blah OR meh "def"
| rex field=uri "POST\s+\/user\/(?<user_id>[^\/]+)"
| timechart span=1d count by user_id

That should do it.

sp1711 · ‎06-02-2015

When I tried rex"\s+\/user\/(\?[^\/]+)" , it gives me the following error.

Error in 'rex' command: The regex '\s+\/user\/(\?[^\/]+)' does not extract anything. It should specify at least one named group. Format: (?...).

jacobwilkins · ‎06-02-2015

Whoops. I copy-pasted the wrong rex into my post. I just edited it, so try that.

sp1711 · ‎06-02-2015

I tried that and I still get Error in 'SearchOperator:rex': Usage: regex [field=]

sp1711 · ‎06-02-2015

Okay I get the query working now but the output I get is weird.

                    NULL
day1             23
day2             10
day3              25

This is what I get. Why is it taking user_id as null.?

rsennett_splunk · ‎06-02-2015

try it in regex101.com make sure you are capturing what you think you are capturing.
Also... double check it by adding the filter uri=/user/* to the start of your search.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

rsennett_splunk · ‎06-01-2015

Basically what you want to do is create a field that contains the userid so you can group by it...

POST\s+\/user\/(?<user>[^\/]+)

will create the userid field for you.

if you want to grab the whole thing (and maybe create a field for def_id) just use the slash to jump from segment to segment so that there can be anything between them. Or... if what's between them is static... then use literals:

POST\s+\/user\/(?<userid>[^\/]+)\/[^\/]+\/(?<defid>[^\/]+)\/\S+

makes two fields should you need them.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

sp1711 · ‎06-02-2015

how do I use this? DO i just pass the query as regex and group it by the same. Something like,

"the search"|rex field=new_raw"POST\s+\/user\/(?[^\/]+)". will this work??

rsennett_splunk · ‎06-02-2015

No... the field= is the value you are looking in for the rex which is by default, _raw.
As jacobwilkins showed you, you could if you like, tell Splunk to look in the uri field...
the new field is established in the capturing group (?[^\/]+)

after the field is named you identify WHAT to capture... which in this case will translate to "everything that is not a slash" the markup is on the fritz and is removing part of the capturing group...

check it out here: https://regex101.com/r/zB0aV1/1

You can see on the right hand side, everything that the regex is doing, step by step.

Best thing for you to do, given that it seems you are quite new to Splunk, is to use the "Field Extractor" and use the regex pattern to extract the field as a search time field extraction.

You could also let Splunk do the extraction for you.

When looking at your events (enter everything up to the first pipe and run it), to make things easier you might put in also uri=/user/* just to be sure you get enough examples of what you want to pull
note that the first column in the events grid is a > greater than symbol. click that. (the column is topped with an "i")
Click "Event Actions" and then "Extract Fields".

Then you can use the field extraction wizard to let Splunk do the work. The regex Splunk comes up with may be a bit more cryptic than the one I'm using because it doesn't really have any context to work with.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

rsennett_splunk · ‎06-02-2015

(markup is removing some characters so click the link below and see the actual regex. the new filename comes after the question mark.
So the code should read
open left paren
question mark
less than sign
name of field
greater than sign
left square bracket
carrot
escape
forward slash
right square bracket
plus sign
right paren

https://regex101.com/r/zB0aV1/1

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

How to write a regex to group based on a particular field?

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

How to write a regex to group based on a particular field?

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?