Splunk Search

How to extract portion of the string using Regex

aditsss
Motivator

Hi Eveyone,

Can anyone help me out in this.

I have a field name    Request_URL as = https://xyz/api/groups/230df08c/registry.

I want to extarct "230df08c" portion from every Request_URL .

Can someone guide me with the regular expression of it in splunk

Thanks In advance

 

Labels (2)
0 Karma
1 Solution

to4kawa
Ultra Champion
0 Karma

to4kawa
Ultra Champion

| rex field=Request_URL "groups\/(?<id>[^\/]+)"

0 Karma

aditsss
Motivator

Hi,

Ultra Champion.

I tried this not working getting  the below Error:

 

Error in 'rex' command: The regex 'Request_URL' does not extract anything. It should specify at least one named group. Format: (?<name>...).

 

 

 

0 Karma

to4kawa
Ultra Champion

field=(space)Request_URL is my mistake. please fix it.

0 Karma

aditsss
Motivator

Hi I tried with space. But still getting the same Error

rex field =   Request_URL "groups\/(?<id>[^\/]+)"

 

Error in 'rex' command: The regex 'field' does not extract anything. It should specify at least one named group. Format: (?<name>...).

0 Karma

to4kawa
Ultra Champion

remove space

0 Karma

aditsss
Motivator

Thanks a lot!!!! It works.

Tags (1)
0 Karma

aditsss
Motivator

Hi to4kawa,

Required your help one more time.

This time I have field Request_URL like this

https://xyz/api/connections/c1d30603ddf0

https://yte/api/flow/groups/314e8fead333/controller-services

 

https://tyu/api/services/968d06b5666b

https://hju/api/processors/b5f990b529f4/run-status

I want to extract c1d30603ddf0,314e8fead333,968d06b5666b,b5f990b529f4

portion . This time Request_Url is different. Can you guide me how can I do this?

0 Karma

aditsss
Motivator

Hi , I am not allow to temper data.

Is there any other approach I can follow.

 

Please guide me.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Looks like you are trying to extract a hexadecimal string - try this:

| rex field=Request_URL "\/(?<id>[0-9a-f]+)($|\/)"

aditsss
Motivator

Hi,

I am not able to see any records from this.    

| rex field=Request_URL "\/(?<id>[0-9a-f]+)($|\/)"

Previously after using this

| rex field=Request_URL "([^\r\n\/]*\/){4,5}(connections|groups|services|processors)\/(?<id>[^\r\n\/]+)"

I am able to fetch the ID from the Request_url which includes 4 and 5 slash like below

https://xyz/api/connections/c1d30603ddf0 

https://hju/api/processors/b5f990b529f4/run-status

But I also have Reuest_Url which includes slashes as 3,6,7,8 as well like below

https://apz/api/queues/61c458568edb/flowfiles/content /regisrtry

https://tyu/policies/read/groups/4e25daf4d5d6/var

https://com/6547890e/

so basically I want this below complete regex for slashes (3,4,5,6,7,8)

rex field=Request_URL "([^\r\n\/]*\/){4,5}(connections|groups|services|processors)\/(?<id>[^
\r\n\/]+)"

Please help me out in this.

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

@aditsss I am not sure what it is you are trying to do. Given the example URLs you have provided, the rex expression will extract the ids. Is this not what you are after

| makeresults | eval Request_URL="https://xyz/api/connections/c1d30603ddf0|https://hju/api/processors/b5f990b529f4/run-status|https://apz/api/queues/61c458568edb/flowfiles/content/regisrtry|https://tyu/policies/read/groups/4e25daf4d5d6/var|https://com/6547890e/" | makemv delim="|" Request_URL | mvexpand Request_URL | rex field=Request_URL "\/(?<id>[0-9a-f]+)($|\/)" | fields - _time

 

Request_URLid
https://xyz/api/connections/c1d30603ddf0c1d30603ddf0
https://hju/api/processors/b5f990b529f4/run-statusb5f990b529f4
https://apz/api/queues/61c458568edb/flowfiles/content/regisrtry61c458568edb
https://tyu/policies/read/groups/4e25daf4d5d6/var4e25daf4d5d6
https://com/6547890e/6547890e
0 Karma

aditsss
Motivator

Hi WHisperer, to4kawa

My requirement is some what like this.

I have a field called Request_URL (50+ Request_URL are there)

Some of the sample Request_URL 's are

https://abc/api/flow/groups/7d0c111a-0173-1000-ffff-ffffb9f9694c


https://uip/api/groups/3fe13d52-d326-15a1-acef-ed3395edd973/variable-registry

https://yui/api/flowfile-queues/05ee3b30-d5e1-1977-9aa9-61c458568edb/flowfiles/content

https://hjk/api/connections/0a88df6f-0174-1000-0000-0000577a28e9

https://com/022adcc6-8001-3d7a-b291-3d0831458357/

I want to extract ID's from Request_URL i.e 7d0c111a-0173-1000-ffff-ffffb9f9694c,3fe13d52-d326-15a1-acef-ed3395edd973 etc. ID pattern is same in all Request_URL.

What is the exact Regex that I can use as the patterns of the URL is different.

I use below Regex but its showing only the Request_URL with {4,5} / slashes

rex field=Request_URL "([^\r\n\/]*\/){4,5}(connections|groups|services|processors)\/(?<id>[^
\r\n\/]+)"

Can someone provide me complete Regex for it.

0 Karma

to4kawa
Ultra Champion

your id is 7d0c111a-0173-1000-ffff-ffffb9f9694c
\w{8}-\w{4}-\w{4}-\w{4}-\w{12}

| rex max_match=0 field=Request_URL "(?<id>\w{8}-\w{4}-\w{4}-\w{4}-\w{12})"

yeahnah
Motivator

Hi @aditsss 

With any pattern matching regex it is vital that the examples provide all the possible pattern combinations.  It is also important that you make some effort to understand what is being provided by the Splunk community.  Have you tried looking at regex tutorials yet and understand the regex patterns?

Both @ITWhisperer & @to4kawa have now provided good answers based on the samples you had provided so far.

ITWhisper's needs a slight adjustment to now deal with the new dash (-) character in the new examples you provided, so the matching character set now becomes [0-9a-fA-F\-]+.   I've also added capitals A-F in the set (just in case things change)  and also note that the dash character must be backslashed escaped (\-) as it has special meaning as a range definer in the character set.  So...

 

 

| rex field=Request_URL "\/(?<id>[0-9a-fA-F\-]{8,})($|\/)"

 

 

So this regex capture group will match any combination of hexadecimal characters and dashes that have a leading forward slash (/) and end with a trailing forward slash or line end of line ($).  It will also match if no dashes are in the id group.  It does not care where in the URL string this combination occurs.  I've also added a string length specify - {8,} - that means it must be a least 8 or more characters long to match, which should help prevent false/positive matches. 

to4kawa's answer is also good but not as generic and your Request_URL IDs must have the exact pattern that the regex match is looking for.  Maybe this is the case with your data but based on the changing requirements so far, I'm not to sure.

I've attached a screenshot from the https://regex101.com/ site .  Log in (it's free) and have a play with your data set.  It's a great place to learn more about regex and upskill yourself.  Knowing how to use regex in IT industry is a great skill set to have.

aditsss
Motivator

Hi Everyone,

Thank you for your help. I am able to extract the id's from Request_URL field by using the below Regex  patterns and I am able to put them in separate column called id.

rex field=Request_URL "\/(?<id>[0-9a-fA-F\-]{8,})($|\/)"

rex field=Request_URL "(?<id>\w{8}-\w{4}-\w{4}-\w{4}-\w{12})"

Just required one more help. There are some Request_URL's which does not include the id's like:

https://abc/api/flow/prioritizers

https://poi/api/flow/controller-service-types

https://liu/api/flow/groups/root

https://com/content-viewer/
https://tyu/update-attribute-ui-1.11.1.3.5.0.0-90/configure

 

After using the regex- (rex field=Request_URL "(?<id>\w{8}-\w{4}-\w{4}-\w{4}-\w{12})" OR rex field=Request_URL "\/(?<id>[0-9a-fA-F\-]{8,})($|\/)") in splunk query. My complete data is not coming(Request_URL without ID's are not coming)

I want them also to be displayed and ID column should blank for such Request_URL's

Is there any way to do that. Please guide me.

0 Karma

yeahnah
Motivator

Hi @aditsss 

If you provide the whole Splunk search query you are currently using and a sample of the raw data/events stored in Splunk (please remove/mask any possible customer or PII data).  Without this then there is no way to really assist as it could be due to many reasons that it does not display.  Though I suspect you are close now and it will be something simple to identify/fix in your search query.

  

0 Karma

aditsss
Motivator

Hi yeahnah,

 

Below is my search query

index=idex4 sourcetype=xyz source="/a/b/c/d-log" (Type ="*") (Name_Id ="*") (Request_URL ="*")
| convert timeformat="%Y-%m-%d" ctime(_time) AS Date
| rex field=Request_URL "\/(?<id>[0-9a-fA-F\-]{8,})($|\/)"
|stats count by Date Name_Id Type Request_URL id
|sort - Name_Id

OR


index=idex4 sourcetype=xyz source="/a/b/c/d-log" (Type ="*") (Name_Id ="*") (Request_URL ="*")
| convert timeformat="%Y-%m-%d" ctime(_time) AS Date
| rex field=Request_URL "(?<id>\w{8}-\w{4}-\w{4}-\w{4}-\w{12})"
|stats count by Date Name_Id Type Request_URL id
|sort - Name_Id

By using Regex only records which includes id in  Request_URL are displaying. Rest records are not not displaying.(Basically Request_URL which does not include ID's)

https://abc/api/flow/prioritizers

https://poi/api/flow/controller-service-types

https://liu/api/flow/groups/root

https://com/content-viewer/
https://tyu/update-attribute-ui-1.11.1.3.5.0.0-90/configure

I want to display all records and the Request_Url which does not include id's. I want id column should be blank for them. But at least all records should be displayed.

Thanks in advance.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Hi @aditsss 

It is because you are using id in your stats clause. Rows where id does not evaluate to anything (and are null) are ignored by stats. The way to solve this is with fillnull

| makeresults
| eval Request_URL="https://xyz/api/connections/c1d30603ddf0|https://hju/api/processors/b5f990b529f4/run-status|https://apz/api/queues/61c458568edb/flowfiles/content/regisrtry|https://tyu/policies/read/groups/4e25daf4d5d6/var|https://com/6547890e/|https://abc/api/flow/groups/7d0c111a-0173-1000-ffff-ffffb9f9694c|https://uip/api/groups/3fe13d52-d326-15a1-acef-ed3395edd973/variable-registry|https://yui/api/flowfile-queues/05ee3b30-d5e1-1977-9aa9-61c458568edb/flowfiles/content|https://hjk/api/connections/0a88df6f-0174-1000-0000-0000577a28e9|https://com/022adcc6-8001-3d7a-b291-3d0831458357/|https://abc/api/flow/prioritizers|https://poi/api/flow/controller-service-types|https://liu/api/flow/groups/root|https://com/content-viewer/|https://tyu/update-attribute-ui-1.11.1.3.5.0.0-90/configure"
| makemv delim="|" Request_URL
| mvexpand Request_URL
| rex field=Request_URL "\/(?<id>[0-9a-fA-F-]{8,})($|\/)"
| fillnull value="" id
| fields - _time
| stats values(Request_URL) as Request_URL, count by id

(Btw, you don't have to escape the final hyphen, although there is no harm in doing it, it just needs to be at the end of the search pattern.)

id
Request_URL
count
 
https://abc/api/flow/prioritizers
https://com/content-viewer/
https://liu/api/flow/groups/root
https://poi/api/flow/controller-service-types
https://tyu/update-attribute-ui-1.11.1.3.5.0.0-90/configure
5
022adcc6-8001-3d7a-b291-3d0831458357https://com/022adcc6-8001-3d7a-b291-3d0831458357/1
05ee3b30-d5e1-1977-9aa9-61c458568edbhttps://yui/api/flowfile-queues/05ee3b30-d5e1-1977-9aa9-61c458568edb/flowfiles/content1
0a88df6f-0174-1000-0000-0000577a28e9https://hjk/api/connections/0a88df6f-0174-1000-0000-0000577a28e91
3fe13d52-d326-15a1-acef-ed3395edd973https://uip/api/groups/3fe13d52-d326-15a1-acef-ed3395edd973/variable-registry1
4e25daf4d5d6https://tyu/policies/read/groups/4e25daf4d5d6/var1
61c458568edbhttps://apz/api/queues/61c458568edb/flowfiles/content/regisrtry1
6547890ehttps://com/6547890e/1
7d0c111a-0173-1000-ffff-ffffb9f9694chttps://abc/api/flow/groups/7d0c111a-0173-1000-ffff-ffffb9f9694c1
b5f990b529f4https://hju/api/processors/b5f990b529f4/run-status1
c1d30603ddf0https://xyz/api/connections/c1d30603ddf01

yeahnah
Motivator

Hi @aditsss 

Depending on your needs another approach is to group by Request_URL instead which ensures every Request_URL is listed and does not care if an id is null or not.

...
| rex field=Request_URL "\/(?<id>[0-9a-fA-F-]{8,})($|\/)"
| stats values(id) as id count by Request_URL

 

0 Karma

aditsss
Motivator

Hi Everyone, Thank you so much for your help. This is exactly what I was looking for.

Thanks ITWhisperer,yeahnah,to4kawa for all the answers you provided.

I am very new to splunk. Thank you so much for all your guidence.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...