Splunk Search

Extract fields with multiple results or generate fake/dummy events

michwii
New Member

Hi all,

I'm struggling these days with regular expressions and field extractions with events that contain multiple results.

We are trying to extract SVN logs and do some statistics with them.
In a SVN log we have a date, an userID, an action ( D elete / A dd / U pdate) and a file associated and finally with have a comment.

Here are 3 examples :
Very simple, 1 file added

Fri Jul 31 09:36:48 CEST 2015 --- X9896 --- A BUSINESSOBJECTS/TAGS/PROD_2564845/ --- Progetto BI1500624 - Mapping

Multiple files commited :

Wed Jul 29 11:05:03 CEST 2015 --- X9896 --- A BATCH/BRANCHES/PROD/MENS/DEW/EXECUTE_95_22 A BATCH/BRANCHES/PROD/MENS/DEW/assicurazione_nuove_fasce_riporti__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/Inssurance__95_22.sas U BATCH/BRANCHES/PROD/MENS/DEW/coeff_rip.ksh A BATCH/BRANCHES/PROD/MENS/DEW/coeff_cc_istituzionale_rip__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/coeff_cc_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/coeff_cc_rip__95_22__201105.sas A BATCH/BRANCHES/PROD/MENS/DEW/incassi_nuove_fasce_riporti__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/incassi_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/incorso_afterInstutition_rip__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/ino_After_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/incorsve_fasce_report__95_22.ksh A BATCH/BRANCHES/PROD/MENS/IAS/tnew_fasce__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/taxe_rip__95_22.sas A BATCH/BRANCHES/PROD/MENS/DEW/teorike_rip__95_22.ksh A BATCH/BRANCHES/PROD/MENS/DEW/teorico_rip__95_22.sas --- Addscript presentin ~mens/DEW

More complicated (you have now spaces into the file name) :

Wed Jul 29 10:10:06 CEST 2015 --- G5461 --- D BUSINESSOBJECTS/TRUNK/Nero.sev D BUSINESSOBJECTS/TRUNK/Cadran.sev D BUSINESSOBJECTS/TRUNK/Controllo Metodologico.unv D BUSINESSOBJECTS/TRUNK/MaraCredit.sev D BUSINESSOBJECTS/TRUNK/DM_RISK.unv D BUSINESSOBJECTS/TRUNK/Rers.sev D BUSINESSOBJECTS/TRUNK/Search.sev D BUSINESSOBJECTS/TRUNK/Cars.sev D BUSINESSOBJECTS/TRUNK/uni_rec.unv --- Dismissione universi Nero, Cadran, Maracredit, Controllo Metodologico, DM_RISCHIO, Piani, Ricerca, Universo Recupero e Veicoli

In order to simplify my problematic, I decided to first focus on extracting 3 fields ( userID, commits, comments ) with this regular expression :

sourcetype=svn  source="script-svn_log" | rex max_match=0 ---(?<userID>.*)---(?<Commit>.*)---(?<Comment>.*)

Now I would like to parse again the field Commit. I need to identify all the files committed with the action associated (U or A or D).

First question : How can I run another regular expression on a specific field (in my case Commit) ?

Second question : For each value that has been found, will it been possible to create dummy/fake events that will help me to do statistics on my commits ? Those dummy events has to have the same fields that has the parent.

Don't hesitate if you need further details on my problem.

Thank you guys for your time.

Have a nice day =D

0 Karma

somesoni2
Revered Legend

Something like below worked for me with your sample data

Ans 1: Use the field=fieldname with rex to use the specific field, by default is _raw

sourcetype=svn  source="script-svn_log" | rex "---(?<userID>.*) --- (?<Commit>.*) --- (?<Comment>.*)" | rex max_match=0 field=Commit "(?<Action>\w) (?<FileName>(\w+\/)+((\w*\s*)*(\.\w+)*))"

Ans 2: Dummy event, once you get your file name as multivalued field, use mvexpand, like below, to split them into separate event, keeping value for base events the same (since with file name, you need Action, an extra step to keep Action and FileName together using mvzip)

sourcetype=svn  source="script-svn_log" | rex "---(?<userID>.*) --- (?<Commit>.*) --- (?<Comment>.*)" | rex max_match=0 field=Commit "(?<Action>\w) (?<FileName>(\w+\/)+((\w*\s*)*(\.\w+)*))" | eval temp=mvzip(Action,FileName,"#") | mvexpand temp | rex field=temp "(?<Action>.*)#(?<FileName>.*)" | fields - temp
0 Karma

richgalloway
SplunkTrust
SplunkTrust

To run a regex on a specific field, specify that field in the rex command.

sourcetype=svn  source="script-svn_log" | rex max_match=0 "---(?<userID>.*) --- (?<Commit>.*) --- (?<Comment>.*)" | rex field=Commit "(?<Action>\w) (?<file>.*)"

You can create separate events using the mvexpand command. See Example 3 at http://docs.splunk.com/Documentation/Splunk/6.2.5/SearchReference/Mvexpand#Examples

---
If this reply helps you, Karma would be appreciated.
0 Karma

diogofgm
SplunkTrust
SplunkTrust

have you tried to use something like https://regex101.com?
Are the fields always splited by --- ?

------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...