Getting Data In

Extract parent folder and sub folder path from windows and nix format

ramuzzini
Path Finder

Need some assistance with creating a query where I am trying to capture the parent folder and the 1st child folder respectively from a print output log that has both windows and linux folder paths.  Sample data and folder paths I am trying to get in a capture group is in bold.

_time,     username,      computer,      printer,      source_dir,      status

2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\cpn-fs.local\data\program\....,         In queue

I am currently using a Splunk query where I call these folders in my initial search, but I want to control this using a rex command so I can add an eval command to see if they were printed locally or from a server folder.  Current query is:

index=printLog  source_dir IN ("\\\\cpn-fs.local\data\*", "\\\\cpn-fs.local\transfer\*",  "c:\\program files\\*", " F:\\transfer\\*" )  status== "Printed"
| table status, _time, username, computer, printer, source_dir

I tried using the following rex but didn't get any return:
     | rex field=source_dir "(?i)<FolderPath>(?i[A-Z][a-z]\:|\\\\{1})[^\\\\]+)\\\\[^\\\\]+\\\\)"

In my second effort, through Splunk I generated these two regex using the field extractor respectively.  I know I need to pipe them to add the "OR" operator when comparing the windows and Linux paths but I get an error when trying to combine them.

Regex generated from windows:  c:\program files 
^[^ \n]* \w+,,,(?P<FolderPath>\w+:\\\w+)

Regex generated from linux: \\cpn-fs.local\data
^[^ \n]* \w+,,,(?P<FolderPath>\\\\\w+\-\w+\d+\.\w+\.\w+\\\w+)

To start, I am looking for an output which should look like what is seen below to replace the "source_dir" with the rex "FolderPath"  created

_time,     username,      computer,      printer,      FolderPath,      file,    status

2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\cpn-fs.local\data\,    Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\,            Printed


Thanks for any help given.

Labels (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\[^\\\\]+|\w:)\\\\[^\\\\]+\\\\)"

View solution in original post

ramuzzini
Path Finder

Appreciate the help.  This is working in part.  For the server path, I am getting the proper output. 

However, for the drive path, I am getting a result as c:\program files\documents\ or F:\transfers\program\ and not c:\program files\  or F:\transfers\.   Trying to make the output see that the drive letter is the root folder.  I should have worded it as the root location.  Also, I have done some review of rex/regex videos online and still learning and trying to decipher each part of the regular expression and how they are broken up to capture each part of the file path.  Can you explain this a bit or point me to any additional tutorial that can help me understand this more.  Much appreciated.  

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\[^\\\\]+|\w:)\\\\[^\\\\]+\\\\)"

ramuzzini
Path Finder

Thanks for the help.  Much appreciated.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\|\w:\\\\)[^\\\\]+\\\\\w+)"

btw, they are not really Linux paths as linux uses forward slashes "/"

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...