Splunk Search

How to extract the last part of all the combined URLs

kiru2992
Path Finder

Hello Everyone!

 

I have a field(FieldA) which contains multiple URLs together. I would like to have a new field(FieldB) with the list of last part of all the URLs.

FieldA: https://...../...../..../123994https://.../....../....../....../123441 https://.../....../....../....../133456

FieldB: 123994 123441 133456

Currently I am using the below query for extraction but I am only getting '133456' not the list of all the values.

Query:

| rex field=FieldA "\/(?<FieldB>\w+)$"

 

Can you please help me with expression for desired output or is there a better way of doing the same?

Labels (2)
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try this:

| rex max_match=0 field=FieldA "\/(?<FieldB>[^\/]*)(https:|\n|$)"
| mvcombine delim=" " FieldB

View solution in original post

to4kawa
Ultra Champion

The field extraction of the log is wrong to begin with.
It's faster to extract it from the raw logs.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Try this

| rex max_match=0 field=FieldA "\/(?<FieldB>\w+)$"
| mvexpand FieldB
---
If this reply helps you, Karma would be appreciated.

kiru2992
Path Finder

Hello @richgalloway 

I am sorry.. I am still getting only the last number.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

That's probably because of the $ anchor in the regex.  Try removing it.  You may then find the regex matches other parts of the URLs since '\w+' will match a lot of text.  If so, it will be necessary to modify the regex to match only the ends of the URLs.

---
If this reply helps you, Karma would be appreciated.

kiru2992
Path Finder

Hello @richgalloway ,

As you mentioned,  the '/w +' gets other parts of URL after removing '$'.  All parts of the URLs are individually mapped to fieldB resulting in duplicate entries.  

Can you please let me know how get only the end of URLs in a single row?

ITWhisperer
SplunkTrust
SplunkTrust

How are the URLs delimited in FieldA? it looks like in some instances there is a space but not others. Try:

| rex max_match=0 field=FieldA "\/(?<FieldB>[^\/]*)( |$)"
| mvexpand FieldB

If there is no space between in some instances, use:

| rex max_match=0 field=FieldA "\/(?<FieldB>[^\/]*)(https:| |$)"
| mvexpand FieldB

 

kiru2992
Path Finder

Hello @ITWhisperer ,

The first snippet gives only the last part of the last URL

The second snippet gives only the last part of the first URL.

Can you please let me know how to get the list of last parts of all the URLs?

kiru2992
Path Finder

Hello @ITWhisperer ,

I forgot to mention, it is a '\n' between each URL not space.

ITWhisperer
SplunkTrust
SplunkTrust

Try this:

| rex max_match=0 field=FieldA "\/(?<FieldB>[^\/]*)(https:|\n|$)"
| mvcombine delim=" " FieldB

kiru2992
Path Finder

Hello @ITWhisperer ,

Thank you!! It worked like charm.:)

0 Karma

kiru2992
Path Finder

Hello @ITWhisperer ,

Now I would like to have separate row for each of the extracted value but I am not able to split the extracted 'fieldB'. Can you please help me with this?

ITWhisperer
SplunkTrust
SplunkTrust

If you want FieldB separated into events use mvexpand instead of mvcombine

| rex max_match=0 field=FieldA "\/(?<FieldB>[^\/]*)(https:|\n|$)"
| mvexpand FieldB

 

0 Karma

kiru2992
Path Finder

Hello @ITWhisperer ,

Thank you. It worked:)

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If your problem is resolved, then please click the "Accept as Solution" button to help future readers.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...