Hi all,
I'm struggling these days with regular expressions and field extractions with events that contain multiple results.
We are trying to extract SVN logs and do some statistics with them.
In a SVN log we have a date, an userID, an action ( D elete / A dd / U pdate) and a file associated and finally with have a comment.
Here are 3 examples :
Very simple, 1 file added
Fri Jul 31 09:36:48 CEST 2015 --- X9896 --- A BUSINESSOBJECTS/TAGS/PROD_2564845/ --- Progetto BI1500624 - Mapping
Multiple files commited :
Wed Jul 29 11:05:03 CEST 2015 --- X9896 --- A BATCH/BRANCHES/PROD/MENS/DEW/EXECUTE_95_22 A BATCH/BRANCHES/PROD/MENS/DEW/assicurazione_nuove_fasce_riporti__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/Inssurance__95_22.sas U BATCH/BRANCHES/PROD/MENS/DEW/coeff_rip.ksh A BATCH/BRANCHES/PROD/MENS/DEW/coeff_cc_istituzionale_rip__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/coeff_cc_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/coeff_cc_rip__95_22__201105.sas A BATCH/BRANCHES/PROD/MENS/DEW/incassi_nuove_fasce_riporti__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/incassi_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/incorso_afterInstutition_rip__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/ino_After_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/incorsve_fasce_report__95_22.ksh A BATCH/BRANCHES/PROD/MENS/IAS/tnew_fasce__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/taxe_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/teorike_rip__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/teorico_rip__95_22.sas --- Addscript presentin ~mens/DEW
More complicated (you have now spaces into the file name) :
Wed Jul 29 10:10:06 CEST 2015 --- G5461 --- D BUSINESSOBJECTS/TRUNK/Nero.sev D BUSINESSOBJECTS/TRUNK/Cadran.sev D BUSINESSOBJECTS/TRUNK/Controllo Metodologico.unv D BUSINESSOBJECTS/TRUNK/MaraCredit.sev D BUSINESSOBJECTS/TRUNK/DM_RISK.unv D BUSINESSOBJECTS/TRUNK/Rers.sev D BUSINESSOBJECTS/TRUNK/Search.sev D BUSINESSOBJECTS/TRUNK/Cars.sev D BUSINESSOBJECTS/TRUNK/uni_rec.unv --- Dismissione universi Nero, Cadran, Maracredit, Controllo Metodologico, DM_RISCHIO, Piani, Ricerca, Universo Recupero e Veicoli
In order to simplify my problematic, I decided to first focus on extracting 3 fields ( userID, commits, comments ) with this regular expression :
sourcetype=svn source="script-svn_log" | rex max_match=0 ---(?<userID>.*)---(?<Commit>.*)---(?<Comment>.*)
Now I would like to parse again the field Commit. I need to identify all the files committed with the action associated (U or A or D).
First question : How can I run another regular expression on a specific field (in my case Commit) ?
Second question : For each value that has been found, will it been possible to create dummy/fake events that will help me to do statistics on my commits ? Those dummy events has to have the same fields that has the parent.
Don't hesitate if you need further details on my problem.
Thank you guys for your time.
Have a nice day =D
Something like below worked for me with your sample data
Ans 1: Use the field=fieldname with rex to use the specific field, by default is _raw
sourcetype=svn source="script-svn_log" | rex "---(?<userID>.*) --- (?<Commit>.*) --- (?<Comment>.*)" | rex max_match=0 field=Commit "(?<Action>\w) (?<FileName>(\w+\/)+((\w*\s*)*(\.\w+)*))"
Ans 2: Dummy event, once you get your file name as multivalued field, use mvexpand, like below, to split them into separate event, keeping value for base events the same (since with file name, you need Action, an extra step to keep Action and FileName together using mvzip)
sourcetype=svn source="script-svn_log" | rex "---(?<userID>.*) --- (?<Commit>.*) --- (?<Comment>.*)" | rex max_match=0 field=Commit "(?<Action>\w) (?<FileName>(\w+\/)+((\w*\s*)*(\.\w+)*))" | eval temp=mvzip(Action,FileName,"#") | mvexpand temp | rex field=temp "(?<Action>.*)#(?<FileName>.*)" | fields - temp
To run a regex on a specific field, specify that field in the rex
sourcetype=svn source="script-svn_log" | rex max_match=0 "---(?<userID>.*) --- (?<Commit>.*) --- (?<Comment>.*)" | rex field=Commit "(?<Action>\w) (?<file>.*)"
You can create separate events using the mvexpand
command. See Example 3 at http://docs.splunk.com/Documentation/Splunk/6.2.5/SearchReference/Mvexpand#Examples
have you tried to use something like https://regex101.com?
Are the fields always splited by --- ?