Splunk Search

Problem with regular expression

Federica_92
Communicator

Hi everyone, I have create a regular expression query that match in a long list of pathname 1 specific folder, and next cut everything that there is after this folder:

   index=main "  | rex "\s\-\s\[(?<path_dd>.+)\specific_folder" | dedup path_dd | eval path="file:read:"+path_dd+"*" | sort by path| table path | outputlookup output.csv append=True

Next, I have add inputlookuptable at the start of the query, this table contain always path name, and there is one only field per line: path

So I have tried to edit the query like that:

 | inputlookup write_rules.csv | rex "(?<path_dd>.+)\/specific_folder" | table path_dd

But it's not working, can anyone help me?
Thank you

Example of the file:

/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/host-manager/loader

/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/examples/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/SESSIONS.ser
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/docs/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/manager/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/_/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/_maven.repositories
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom.ahc26f05574a43e4fce
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom.sha1.ahca7be2b392cec49e7
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom.sha1

Tags (2)
0 Karma
1 Solution

somesoni2
Revered Legend

Thanks, the sample entries would be helpful but I believe the problem is not in regex.

When you run this, you would have an Splunk in-built field call '_raw'. This is the default field that a rex statement work on.

index=main  | rex "\s\-\s\[(?<path_dd>.+)\specific_folder" | dedup path_dd | eval path="file:read:"+path_dd+"*" | sort by path| table path | outputlookup output.csv append=True

So statement | rex "\s\-\s\[(?.+)\specific_folder" is same as | rex field=_raw "\s\-\s\[(?.+)\specific_folder"

Whereas, when you run this (with inputlookup), there is no field with name _raw. SO here you would have to specify your field name from which the path_dd will be extracted.

| inputlookup write_rules.csv | rex "(?<path_dd>.+)\/specific_folder" | table path_dd

So, replace | rex "(?.+)\/specific_folder" with | rex field=fieldFromCSVFile "(?.+)\/specific_folder"

View solution in original post

0 Karma

paddygriffin
Path Finder

This revised regex should do the named capture from the sample string you provided
backwardslashS(?lessthanpathddgreaterthan.+)tomcat7

Notes on modification to regex:
changed to capture any non-whitespace character (S) before the literal value "tomcat7"
also the named capture's name shouldn't contain a hyphen so changed it to pathdd
tested against your supplied input string at regex101.com

Match information
MATCH 1
pathdd [1-58] home/jenkins/qa-automation-smcconnell/Automation/Tomcats/

Hope this helps with the regex part of the question.
If you're working against input from an inputlookup command I believe someson12 is correct - in the rex command you need to specify the fieldname from the csv that you want to apply the regex to.

sorry for some reason the capture name was edited out when i posted the reply, possibly because or the angle brackets - i've replaced them with "lessthan" and "greaterthan" here, also the backslash at the beginning

0 Karma

somesoni2
Revered Legend

Thanks, the sample entries would be helpful but I believe the problem is not in regex.

When you run this, you would have an Splunk in-built field call '_raw'. This is the default field that a rex statement work on.

index=main  | rex "\s\-\s\[(?<path_dd>.+)\specific_folder" | dedup path_dd | eval path="file:read:"+path_dd+"*" | sort by path| table path | outputlookup output.csv append=True

So statement | rex "\s\-\s\[(?.+)\specific_folder" is same as | rex field=_raw "\s\-\s\[(?.+)\specific_folder"

Whereas, when you run this (with inputlookup), there is no field with name _raw. SO here you would have to specify your field name from which the path_dd will be extracted.

| inputlookup write_rules.csv | rex "(?<path_dd>.+)\/specific_folder" | table path_dd

So, replace | rex "(?.+)\/specific_folder" with | rex field=fieldFromCSVFile "(?.+)\/specific_folder"

0 Karma

somesoni2
Revered Legend

Can you post some sample events from the write_rules.csv file (one which is not working)?

0 Karma

Federica_92
Communicator

The query don't produce any events, and the job inspector say that there aren't match fields.

0 Karma

somesoni2
Revered Legend

Is the lookup table write_rules.csv empty? What does it return if you just run this

| inputlookup write_rules.csv 
0 Karma

Federica_92
Communicator

yes, it's not empty

0 Karma

somesoni2
Revered Legend

That is good. The remaining portion of the search is searching for a specific pattern (regex) and it's not able to find the pattern causing the end result to be be empty. To see if the pattern used is correct or not, please provide some sample entries from the write_rules.csv file (which should be added as a lookup table file).

0 Karma

Federica_92
Communicator

I have add it in the answer! : )

0 Karma

paddygriffin
Path Finder

This revised regex should do the named capture from the sample string you provided
\S(?.+)tomcat7

Notes on modification to regex:
changed to capture any non-whitespace character (\S) before the literal value "tomcat7"
also the named capture's name shouldn't contain a hyphen so changed it to pathdd
tested against your supplied input string at regex101.com

Match information
MATCH 1
pathdd [1-58] home/jenkins/qa-automation-smcconnell/Automation/Tomcats/

Hope this helps with the regex part of the question.

0 Karma

paddygriffin
Path Finder

sorry for some reason the capture name was edited out when i posted the reply, possibly because or the angle brackets - i've replaced them with "lessthan" and "greaterthan" here
\S(?lessthanpathddgreaterthan.+)tomcat7

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...