Re: How to extract a filepath after certain folder...

anooshac · ‎09-06-2023

Hi Everyone,

I have to extract a file path from a path.

The path will be in the format C:\a\b\c\abc\xyz\abc.h.

I want to skip first 4 folders. That is in this example i want to extract \abc\xyz\abc.h.

How can i dot it using regex?

anooshac · ‎09-06-2023

hi @gcusello ,

I am not able to get the path \xyz\abc.h using this regex..

gcusello · ‎09-06-2023

Hi @anooshac,

wher there are more backslashes there is an issue, so please try:

| rex field=your_field "^\w:\\\\w+\\\\w+\\\\w+\\\\w+(?<filename>.*)"

ciao.

Giuseppe

anooshac · ‎09-06-2023

hi @gcusello , still i am not able to extract.

gcusello · ‎09-06-2023

Hi @anooshac,

the second regex is correct, as you can check at https://regex101.com/r/kpyTLl/2,

in Splunk is different when you have backslashes, so you can try:

| rex field=your_field "^\w*:\\\\\w*\\\\\w*\\\\\w*\\\\\w*\\\\(?<filename>.*)"

as you can check using the following search:

| makeresults 
| eval my_field="C:\a\b\c\abc\xyz\abc.h"
| rex field=my_field "^\w*:\\\\\w*\\\\\w*\\\\\w*\\\\\w*\\\\(?<filename>.*)"

Ciao.

Giuseppe

anooshac · ‎09-06-2023

Hi @gcusello ,

I tested it and it is working fine. The paths in my data are vary from another. I may have data something like this. In these conditions will it work.

C:\a\b\c\abc.pqr.a1.b1.jkl\xyz\abc.h

PickleRick · ‎09-07-2023

OK. Assuming that:

1. You always have a drive letter at the beginning

2. You don't have "empty parts" (you don't have consecutive backslashes which are syntactically correct if you want to specify a file path but are typically not returned as a path to existing file)

3. You want to extract the part after the first four components

The regex to do so would be like that:

[a-zA-Z]:\\\\([^\\]+\\){4}(?<remainder>.*)

The "remainder" capture group will capture the path after first four directories.

Of course if you want to do it with "rex" command in Splunk, you need to escape all backslashes which makes it something like this:

| rex  "[a-zA-Z]:\\\\\\\\([^\\\\]+\\\\){4}(?<remainder>.*)"

gcusello · ‎09-06-2023

Hi @anooshac,

let me understand, you could have different log formats: "C:\a\b\c\abc\xyz\abc.h" or ""C:\a\b\c\abc.pqr.a1.b1.jkl\xyz\abc.h", is it correct?

in this case, you could try:

| rex field=your_field "^\w*:\\\\[^\\\]*\\\\\w*\\\\[^\\\]*\\\\[^\\\]*\\\\(?<filename>.*)"

that you can try using this search:

| makeresults
| eval your_field="C:\a\b\c\abc\xyz\abc.h"
| append [ | makeresults | eval your_field="C:\a\b\c\abc.pqr.a1.b1.jkl\xyz\abc.h" ]
| rex field=your_field "^\w*:\\\\[^\\\]*\\\\\w*\\\\[^\\\]*\\\\[^\\\]*\\\\(?<filename>.*)"

Ciao.

Giuseppe

gcusello · ‎09-06-2023

Hi @anooshac,

I suppose that you have this path in a field, so you could use something like this:

| rex field=your_field "^(?<path>\w:\\\w+\\\w+\\\w+\\\w+)"

that you can test at https://regex101.com/r/kpyTLl/1

It could be possible that there's an issue for a difference between regex101.com and Splunk, so, if the above regex doesn't run, please try this:

| rex field=your_field "^(?<path>\w:\\\\w+\\\\w+\\\\w+\\\\w+)"

Ciao.

Giuseppe

anooshac · ‎09-06-2023

Hi @gcusello , Thanks for the response..

I don't want to extract the first 4 folders.. I want to skip them and extract the rest of the path.. I was finding hard writing a regex.. How can i do this?

gcusello · ‎09-06-2023

Hi @anooshac,

it's the same thing:

| rex field=your_field "^\w:\\\w+\\\w+\\\w+\\\w+(?<filename>.*)"

Ciao.

Giuseppe

How to extract a filepath after certain folders using regex?

chart

simple XML

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio