Solved: Problem with regular expression

Federica_92 · ‎03-27-2015

Hi everyone, I have create a regular expression query that match in a long list of pathname 1 specific folder, and next cut everything that there is after this folder:

   index=main "  | rex "\s\-\s\[(?<path_dd>.+)\specific_folder" | dedup path_dd | eval path="file:read:"+path_dd+"*" | sort by path| table path | outputlookup output.csv append=True

Next, I have add inputlookuptable at the start of the query, this table contain always path name, and there is one only field per line: path

So I have tried to edit the query like that:

 | inputlookup write_rules.csv | rex "(?<path_dd>.+)\/specific_folder" | table path_dd

But it's not working, can anyone help me?
Thank you

Example of the file:

/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/host-manager/loader

/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/examples/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/SESSIONS.ser
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/docs/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/manager/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/_/loader
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/_maven.repositories
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom.ahc26f05574a43e4fce
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom.sha1.ahca7be2b392cec49e7
/home/jenkins/qa-automation-smcconnell/Automation/Tomcats/tomcat7/work/Catalina/localhost/xwiki_oracle/xwiki-temp/aether-repository/org/apache/maven/doxia/doxia-core/1.3/doxia-core-1.3.pom.sha1

somesoni2 · ‎03-27-2015

Thanks, the sample entries would be helpful but I believe the problem is not in regex.

When you run this, you would have an Splunk in-built field call '_raw'. This is the default field that a rex statement work on.

index=main  | rex "\s\-\s\[(?<path_dd>.+)\specific_folder" | dedup path_dd | eval path="file:read:"+path_dd+"*" | sort by path| table path | outputlookup output.csv append=True

So statement | rex "\s\-\s\[(?.+)\specific_folder" is same as | rex field=_raw "\s\-\s\[(?.+)\specific_folder"

Whereas, when you run this (with inputlookup), there is no field with name _raw. SO here you would have to specify your field name from which the path_dd will be extracted.

| inputlookup write_rules.csv | rex "(?<path_dd>.+)\/specific_folder" | table path_dd

So, replace | rex "(?.+)\/specific_folder" with | rex field=fieldFromCSVFile "(?.+)\/specific_folder"

View solution in original post

paddygriffin · ‎03-27-2015

This revised regex should do the named capture from the sample string you provided
backwardslashS(?lessthanpathddgreaterthan.+)tomcat7

Notes on modification to regex:
changed to capture any non-whitespace character (S) before the literal value "tomcat7"
also the named capture's name shouldn't contain a hyphen so changed it to pathdd
tested against your supplied input string at regex101.com

Match information
MATCH 1
pathdd [1-58] home/jenkins/qa-automation-smcconnell/Automation/Tomcats/

Hope this helps with the regex part of the question.
If you're working against input from an inputlookup command I believe someson12 is correct - in the rex command you need to specify the fieldname from the csv that you want to apply the regex to.

sorry for some reason the capture name was edited out when i posted the reply, possibly because or the angle brackets - i've replaced them with "lessthan" and "greaterthan" here, also the backslash at the beginning

somesoni2 · ‎03-27-2015

Thanks, the sample entries would be helpful but I believe the problem is not in regex.

When you run this, you would have an Splunk in-built field call '_raw'. This is the default field that a rex statement work on.

index=main  | rex "\s\-\s\[(?<path_dd>.+)\specific_folder" | dedup path_dd | eval path="file:read:"+path_dd+"*" | sort by path| table path | outputlookup output.csv append=True

So statement | rex "\s\-\s\[(?.+)\specific_folder" is same as | rex field=_raw "\s\-\s\[(?.+)\specific_folder"

Whereas, when you run this (with inputlookup), there is no field with name _raw. SO here you would have to specify your field name from which the path_dd will be extracted.

| inputlookup write_rules.csv | rex "(?<path_dd>.+)\/specific_folder" | table path_dd

So, replace | rex "(?.+)\/specific_folder" with | rex field=fieldFromCSVFile "(?.+)\/specific_folder"

somesoni2 · ‎03-27-2015

Can you post some sample events from the write_rules.csv file (one which is not working)?

Federica_92 · ‎03-27-2015

The query don't produce any events, and the job inspector say that there aren't match fields.

somesoni2 · ‎03-27-2015

Is the lookup table write_rules.csv empty? What does it return if you just run this

| inputlookup write_rules.csv

Federica_92 · ‎03-27-2015

yes, it's not empty

somesoni2 · ‎03-27-2015

That is good. The remaining portion of the search is searching for a specific pattern (regex) and it's not able to find the pattern causing the end result to be be empty. To see if the pattern used is correct or not, please provide some sample entries from the write_rules.csv file (which should be added as a lookup table file).

Federica_92 · ‎03-27-2015

I have add it in the answer! : )

paddygriffin · ‎03-27-2015

This revised regex should do the named capture from the sample string you provided
\S(?.+)tomcat7

Notes on modification to regex:
changed to capture any non-whitespace character (\S) before the literal value "tomcat7"
also the named capture's name shouldn't contain a hyphen so changed it to pathdd
tested against your supplied input string at regex101.com

Match information
MATCH 1
pathdd [1-58] home/jenkins/qa-automation-smcconnell/Automation/Tomcats/

Hope this helps with the regex part of the question.

paddygriffin · ‎03-27-2015

sorry for some reason the capture name was edited out when i posted the reply, possibly because or the angle brackets - i've replaced them with "lessthan" and "greaterthan" here
\S(?lessthanpathddgreaterthan.+)tomcat7

Problem with regular expression

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?