Splunk Search

How to extract fields from a multiline header followed by structured data columns?

HLVarian
Path Finder

I have a sourcetype that is in CSV format and I'd like to extract fields from the multiline header that proceeds these files coming in. Each new line in the header begins with # and these lines are comma separated. This header is followed by the actual data fields which are semi-colon separated. I'm trying to figure out the best way to go about parsing this data. The data looks like this:

 #PlanID: '1.2.246.352.71.5.459020837699.2820.20131008220539', IrrSessionID: '1.2.246.352.82.6.5130518203565855886.177214397189262813860', FieldNum:1
 #BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 109720.889806
 #Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
 #VALUES;;;
 Field1;Field2;Field3;Field4;...;FieldN
 Data1;Data2;Data4;Data4;...;DataN

I'd like to get out the following fields and what they would be relate to in this example:

PlanID = 1.2.246.352.71.5.459020837699.2820.20131008220539
IrrSessionID = 1.2.246.352.82.6.5130518203565855886.177214397189262813860
FieldNum = 1
BeamSizeID = 4.0
Status = (1, 0, 0)
CumMu = 109720.889806
Temp C/M = (0.000000,0.000000)
Pressure C/M= (0.000000,0.000000)

Followed by:

Field1, Field2,..., etc.

Previously I just had Splunk strip away all the lines beginning with # using:

FIELD DELIMITER = ;
PREAMBLE REGEX = ^\#

(this works well for all my other sourcetypes)

I have also used a Python script to append the necessary fields to file and then I can keep using the aforementioned settings. However, we do not want to run a separate process on our files before getting them into Splunk.

So, what is the best way to get these fields in? Is there a way to manipulate my Python script and use it in Splunk on my incoming data? Or should I use some extensive RegEx in props.conf and transforms.conf to achieve this?

Many thanks!

0 Karma
1 Solution

HLVarian
Path Finder

It looks like the key to this one was to create eventtypes. One based on the header and the other the data. I then ran a search on the eventtype=IrrHeader using a regex. I then appended a search on eventtype=IrrData. From here I created my table.

This was the final search:

eventtype=IrrHeader 
| rex field=_raw "(?ms)^[^:\n]*:\s+'(?P<PlanID>[^']+)[^:\n]*:\s+'(?P<IrrSessionID>[^']+)[^:\n]*:(?P<FieldNum>\d+)\s+#\w+:'(?P<BeamSizeID>\d+\.\d+)',\s+\w+\s+(?P<Status>\(\d+,\s+\d+,\s+\d+\))[^:\n]*:\s+(?P<CumMU>\d+\.\d+)\s+#\w+\s+\w+\/\w+(?P<TempCM>\(\d+\.\d+,\d+\.\d+\)),\s+\w+\s+\w+\/\w+(?P<PressureCM>\(\d+\.\d+,\d+\.\d+\))" 
| append [search sourcetype=Irr2File eventtype=IrrData 
| rex field=_raw "^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"]
| table _time, LayerNr, Energy, DoseRate, StartIrradiation, Cumm_MU, X, Y, MU, DiagnosticDataValid, X_Measured, Y_Measured, X_MeasuredRaw, Y_MeasuredRaw, Mu_MeasuredC, Mu_MeasuredCt, Mu_MeasuredM, Mu_MeasuredMt, minusXBeamWidth, plusXBeamWidth, minusYBeamWidth, plusYBeamWidth, currentX, currentX_Measured, currentY, currentY_Measured, doseRateC, doseRateM, irradiationTime, numSamplesC, numSamplesM, PlanID, IrrSessionID, FieldNum, BeamSize, Status, CumMU, TempCM, PressureCM

View solution in original post

0 Karma

HLVarian
Path Finder

It looks like the key to this one was to create eventtypes. One based on the header and the other the data. I then ran a search on the eventtype=IrrHeader using a regex. I then appended a search on eventtype=IrrData. From here I created my table.

This was the final search:

eventtype=IrrHeader 
| rex field=_raw "(?ms)^[^:\n]*:\s+'(?P<PlanID>[^']+)[^:\n]*:\s+'(?P<IrrSessionID>[^']+)[^:\n]*:(?P<FieldNum>\d+)\s+#\w+:'(?P<BeamSizeID>\d+\.\d+)',\s+\w+\s+(?P<Status>\(\d+,\s+\d+,\s+\d+\))[^:\n]*:\s+(?P<CumMU>\d+\.\d+)\s+#\w+\s+\w+\/\w+(?P<TempCM>\(\d+\.\d+,\d+\.\d+\)),\s+\w+\s+\w+\/\w+(?P<PressureCM>\(\d+\.\d+,\d+\.\d+\))" 
| append [search sourcetype=Irr2File eventtype=IrrData 
| rex field=_raw "^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"]
| table _time, LayerNr, Energy, DoseRate, StartIrradiation, Cumm_MU, X, Y, MU, DiagnosticDataValid, X_Measured, Y_Measured, X_MeasuredRaw, Y_MeasuredRaw, Mu_MeasuredC, Mu_MeasuredCt, Mu_MeasuredM, Mu_MeasuredMt, minusXBeamWidth, plusXBeamWidth, minusYBeamWidth, plusYBeamWidth, currentX, currentX_Measured, currentY, currentY_Measured, doseRateC, doseRateM, irradiationTime, numSamplesC, numSamplesM, PlanID, IrrSessionID, FieldNum, BeamSize, Status, CumMU, TempCM, PressureCM
0 Karma

HLVarian
Path Finder

I don't have enough Karma points to provide attachments yet. However this is a sample of the data. Data comes in on a CSV file and looks just like this if opened in text:

 #PlanID: '1.2.246.352.71.5.546855675324.46054.20150723173433', IrrSessionID: '1.2.246.352.82.6.4916826480507715613.151884354515765536450', FieldNum:1
#BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 26940.399953
#Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
#VALUES;;;
LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM
1;180.139000;304359.803029;AUG 03 2015 16:36:43 GMT;46.599998;-65.599998;-49.200001;46.599998;1;88716.000000;132111.000000;0.000000;0.000000;46.398749;4643.000000;46.342000;4487.000000;5253;5445;4431;3869;-148038;-148074;-75555;-75703;304359.803029;308679.237710;0.008752;100525;100528
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;57.400002;-65.599998;49.000000;1;166233.000000;121459.000000;0.000000;0.000000;49.026979;4906.000000;49.523042;4795.000000;5272;5479;3842;4047;128182;126383;-99672;-99767;264168.273430;269533.829245;0.010886;11063;11063
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-57.400002;-41.000000;46.400001;1;93767.000000;138361.000000;0.000000;0.000000;46.398749;4643.000000;46.620857;4514.000000;5409;5458;3862;4404;-128187;-61199;-62274;-63445;260916.063907;265030.589047;0.010261;202;202
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-65.599998;-57.400002;45.799999;1;88777.000000;126961.000000;0.000000;0.000000;45.799153;4583.000000;45.897893;4444.000000;5298;5481;4166;4455;-146450;-141625;-87204;-83594;261488.078635;264819.896647;0.010151;71;71

Hope this helps others out.

0 Karma

HLVarian
Path Finder

I've had success with this RegEx in search, now I just need to figure out how to format it for props.conf and transforms.conf

 rex field=_raw "(?m)^\#PlanID: '(?<PlanID>[.?\d]+)\', IrrSessionID: '(?<IrrSessionID>[.?\d]+)\', FieldNum:(?<FieldNum>[.?\d].*)$
^\#BeamSizeID:'(?<BeamSizeID>[.?\d]+)\', Status (?<Status>\([^(]*\))\, CumMU: (?<CumMU>[.?\d]+.*)$
^\#Temp C\/M(?<TempCM>\([^(]*\))\, Pressure C\/M(?<PressureCM>\([^(]*\))$
^\#VALUES;;;$
^LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM$
^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"  
| table _time, LayerNr,Energy,DoseRate,StartIrradiation,Cumm_MU,X,Y,MU,DiagnosticDataValid,X_Measured,Y_Measured,X_MeasuredRaw,Y_MeasuredRaw,Mu_MeasuredC,Mu_MeasuredCt,Mu_MeasuredM,Mu_MeasuredMt,minusXBeamWidth,plusXBeamWidth,minusYBeamWidth,plusYBeamWidth,currentX,currentX_Measured,currentY,currentY_Measured,doseRateC,doseRateM,irradiationTime,numSamplesC,numSamplesM,PlanID,IrrSessionID,FieldNum,BeamSize,Status,CumMU,TempCM,PressureCM
0 Karma

HLVarian
Path Finder

Correction, I HAD success with this search. But it now fails.

0 Karma

stephanefotso
Motivator

Hello! Let's get a sample of your data please.

SGF
0 Karma

HLVarian
Path Finder

I, FINALLY, figured it out. Thanks for trying.

0 Karma

HLVarian
Path Finder

I don't have enough Karma points to post an attachment, but see the "answer" I posted below when i was trying to add an attachment for a snapshot of what the data looks like.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...