Getting Data In

Extract parent folder and sub folder path from windows and nix format

ramuzzini
Path Finder

Need some assistance with creating a query where I am trying to capture the parent folder and the 1st child folder respectively from a print output log that has both windows and linux folder paths.  Sample data and folder paths I am trying to get in a capture group is in bold.

_time,     username,      computer,      printer,      source_dir,      status

2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\cpn-fs.local\data\program\....,         In queue

I am currently using a Splunk query where I call these folders in my initial search, but I want to control this using a rex command so I can add an eval command to see if they were printed locally or from a server folder.  Current query is:

index=printLog  source_dir IN ("\\\\cpn-fs.local\data\*", "\\\\cpn-fs.local\transfer\*",  "c:\\program files\\*", " F:\\transfer\\*" )  status== "Printed"
| table status, _time, username, computer, printer, source_dir

I tried using the following rex but didn't get any return:
     | rex field=source_dir "(?i)<FolderPath>(?i[A-Z][a-z]\:|\\\\{1})[^\\\\]+)\\\\[^\\\\]+\\\\)"

In my second effort, through Splunk I generated these two regex using the field extractor respectively.  I know I need to pipe them to add the "OR" operator when comparing the windows and Linux paths but I get an error when trying to combine them.

Regex generated from windows:  c:\program files 
^[^ \n]* \w+,,,(?P<FolderPath>\w+:\\\w+)

Regex generated from linux: \\cpn-fs.local\data
^[^ \n]* \w+,,,(?P<FolderPath>\\\\\w+\-\w+\d+\.\w+\.\w+\\\w+)

To start, I am looking for an output which should look like what is seen below to replace the "source_dir" with the rex "FolderPath"  created

_time,     username,      computer,      printer,      FolderPath,      file,    status

2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\cpn-fs.local\data\,    Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\,            Printed


Thanks for any help given.

Labels (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\[^\\\\]+|\w:)\\\\[^\\\\]+\\\\)"

View solution in original post

ramuzzini
Path Finder

Appreciate the help.  This is working in part.  For the server path, I am getting the proper output. 

However, for the drive path, I am getting a result as c:\program files\documents\ or F:\transfers\program\ and not c:\program files\  or F:\transfers\.   Trying to make the output see that the drive letter is the root folder.  I should have worded it as the root location.  Also, I have done some review of rex/regex videos online and still learning and trying to decipher each part of the regular expression and how they are broken up to capture each part of the file path.  Can you explain this a bit or point me to any additional tutorial that can help me understand this more.  Much appreciated.  

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\[^\\\\]+|\w:)\\\\[^\\\\]+\\\\)"

ramuzzini
Path Finder

Thanks for the help.  Much appreciated.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

| makeresults format=csv data="_time,     username,      computer,      printer,      source_dir,      status
2024-09-24 15:32 ,   auser, cmp_auser,  print01_main1,   \\\\cpn-fs.local\data\program\...,          Printed
2024-09-24 13:57 ,   buser, cmp_buser,  print01_offic1,   c:\program files\documents\...,            Printed
2024-09-24 12:13 ,   cuser, cmp_cuser,  print01_offic2,   \\\\cpn-fs.local\data\transfer\...,            In queue
2024-09-24 09:26,    buser, cmp_buser,  print01_offic1,   F:\transfers\program\...,                           Printed
2024-09-24 09:26,    buser, cmp_buser,  print01_front1,   \\\\cpn-fs.local\transfer\program\...,  Printed
2024-09-24 07:19,    auser, cmp_auser,   print01_main1,   \\\\cpn-fs.local\data\program\....,         In queue"
| rex field=source_dir "(?P<FolderPath>(\\\\\\\\|\w:\\\\)[^\\\\]+\\\\\w+)"

btw, they are not really Linux paths as linux uses forward slashes "/"

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

Industry Solutions for Supply Chain and OT, Amazon Use Cases, Plus More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...