Getting Data In

Extract parent folder and sub folder path from windows and nix format

ramuzzini
Path Finder

Need some assistance with creating a query where I am trying to capture the parent folder and the 1st child folder respectively from a print output log that has both windows and linux folder paths.  Sample data and folder paths I am trying to get in a capture group is in bold.

_time,     username,      computer,      printer,      source_dir,      status

2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\cpn-fs.local\data\program\....,         In queue

I am currently using a Splunk query where I call these folders in my initial search, but I want to control this using a rex command so I can add an eval command to see if they were printed locally or from a server folder.  Current query is:

index=printLog  source_dir IN ("\\\\cpn-fs.local\data\*", "\\\\cpn-fs.local\transfer\*",  "c:\\program files\\*", " F:\\transfer\\*" )  status== "Printed"
| table status, _time, username, computer, printer, source_dir

I tried using the following rex but didn't get any return:
     | rex field=source_dir "(?i)<FolderPath>(?i[A-Z][a-z]\:|\\\\{1})[^\\\\]+)\\\\[^\\\\]+\\\\)"

In my second effort, through Splunk I generated these two regex using the field extractor respectively.  I know I need to pipe them to add the "OR" operator when comparing the windows and Linux paths but I get an error when trying to combine them.

Regex generated from windows:  c:\program files 
^[^ \n]* \w+,,,(?P<FolderPath>\w+:\\\w+)

Regex generated from linux: \\cpn-fs.local\data
^[^ \n]* \w+,,,(?P<FolderPath>\\\\\w+\-\w+\d+\.\w+\.\w+\\\w+)

To start, I am looking for an output which should look like what is seen below to replace the "source_dir" with the rex "FolderPath"  created

_time,     username,      computer,      printer,      FolderPath,      file,    status

2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\cpn-fs.local\data\,    Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\,            Printed


Thanks for any help given.

Labels (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\[^\\\\]+|\w:)\\\\[^\\\\]+\\\\)"

View solution in original post

ramuzzini
Path Finder

Appreciate the help.  This is working in part.  For the server path, I am getting the proper output. 

However, for the drive path, I am getting a result as c:\program files\documents\ or F:\transfers\program\ and not c:\program files\  or F:\transfers\.   Trying to make the output see that the drive letter is the root folder.  I should have worded it as the root location.  Also, I have done some review of rex/regex videos online and still learning and trying to decipher each part of the regular expression and how they are broken up to capture each part of the file path.  Can you explain this a bit or point me to any additional tutorial that can help me understand this more.  Much appreciated.  

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\[^\\\\]+|\w:)\\\\[^\\\\]+\\\\)"

ramuzzini
Path Finder

Thanks for the help.  Much appreciated.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\|\w:\\\\)[^\\\\]+\\\\\w+)"

btw, they are not really Linux paths as linux uses forward slashes "/"

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...