Splunk Search

How to extract fields from a multiline header followed by structured data columns?

HLVarian
Path Finder

I have a sourcetype that is in CSV format and I'd like to extract fields from the multiline header that proceeds these files coming in. Each new line in the header begins with # and these lines are comma separated. This header is followed by the actual data fields which are semi-colon separated. I'm trying to figure out the best way to go about parsing this data. The data looks like this:

 #PlanID: '1.2.246.352.71.5.459020837699.2820.20131008220539', IrrSessionID: '1.2.246.352.82.6.5130518203565855886.177214397189262813860', FieldNum:1
 #BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 109720.889806
 #Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
 #VALUES;;;
 Field1;Field2;Field3;Field4;...;FieldN
 Data1;Data2;Data4;Data4;...;DataN

I'd like to get out the following fields and what they would be relate to in this example:

PlanID = 1.2.246.352.71.5.459020837699.2820.20131008220539
IrrSessionID = 1.2.246.352.82.6.5130518203565855886.177214397189262813860
FieldNum = 1
BeamSizeID = 4.0
Status = (1, 0, 0)
CumMu = 109720.889806
Temp C/M = (0.000000,0.000000)
Pressure C/M= (0.000000,0.000000)

Followed by:

Field1, Field2,..., etc.

Previously I just had Splunk strip away all the lines beginning with # using:

FIELD DELIMITER = ;
PREAMBLE REGEX = ^\#

(this works well for all my other sourcetypes)

I have also used a Python script to append the necessary fields to file and then I can keep using the aforementioned settings. However, we do not want to run a separate process on our files before getting them into Splunk.

So, what is the best way to get these fields in? Is there a way to manipulate my Python script and use it in Splunk on my incoming data? Or should I use some extensive RegEx in props.conf and transforms.conf to achieve this?

Many thanks!

0 Karma
1 Solution

HLVarian
Path Finder

It looks like the key to this one was to create eventtypes. One based on the header and the other the data. I then ran a search on the eventtype=IrrHeader using a regex. I then appended a search on eventtype=IrrData. From here I created my table.

This was the final search:

eventtype=IrrHeader 
| rex field=_raw "(?ms)^[^:\n]*:\s+'(?P<PlanID>[^']+)[^:\n]*:\s+'(?P<IrrSessionID>[^']+)[^:\n]*:(?P<FieldNum>\d+)\s+#\w+:'(?P<BeamSizeID>\d+\.\d+)',\s+\w+\s+(?P<Status>\(\d+,\s+\d+,\s+\d+\))[^:\n]*:\s+(?P<CumMU>\d+\.\d+)\s+#\w+\s+\w+\/\w+(?P<TempCM>\(\d+\.\d+,\d+\.\d+\)),\s+\w+\s+\w+\/\w+(?P<PressureCM>\(\d+\.\d+,\d+\.\d+\))" 
| append [search sourcetype=Irr2File eventtype=IrrData 
| rex field=_raw "^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"]
| table _time, LayerNr, Energy, DoseRate, StartIrradiation, Cumm_MU, X, Y, MU, DiagnosticDataValid, X_Measured, Y_Measured, X_MeasuredRaw, Y_MeasuredRaw, Mu_MeasuredC, Mu_MeasuredCt, Mu_MeasuredM, Mu_MeasuredMt, minusXBeamWidth, plusXBeamWidth, minusYBeamWidth, plusYBeamWidth, currentX, currentX_Measured, currentY, currentY_Measured, doseRateC, doseRateM, irradiationTime, numSamplesC, numSamplesM, PlanID, IrrSessionID, FieldNum, BeamSize, Status, CumMU, TempCM, PressureCM

View solution in original post

0 Karma

HLVarian
Path Finder

It looks like the key to this one was to create eventtypes. One based on the header and the other the data. I then ran a search on the eventtype=IrrHeader using a regex. I then appended a search on eventtype=IrrData. From here I created my table.

This was the final search:

eventtype=IrrHeader 
| rex field=_raw "(?ms)^[^:\n]*:\s+'(?P<PlanID>[^']+)[^:\n]*:\s+'(?P<IrrSessionID>[^']+)[^:\n]*:(?P<FieldNum>\d+)\s+#\w+:'(?P<BeamSizeID>\d+\.\d+)',\s+\w+\s+(?P<Status>\(\d+,\s+\d+,\s+\d+\))[^:\n]*:\s+(?P<CumMU>\d+\.\d+)\s+#\w+\s+\w+\/\w+(?P<TempCM>\(\d+\.\d+,\d+\.\d+\)),\s+\w+\s+\w+\/\w+(?P<PressureCM>\(\d+\.\d+,\d+\.\d+\))" 
| append [search sourcetype=Irr2File eventtype=IrrData 
| rex field=_raw "^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"]
| table _time, LayerNr, Energy, DoseRate, StartIrradiation, Cumm_MU, X, Y, MU, DiagnosticDataValid, X_Measured, Y_Measured, X_MeasuredRaw, Y_MeasuredRaw, Mu_MeasuredC, Mu_MeasuredCt, Mu_MeasuredM, Mu_MeasuredMt, minusXBeamWidth, plusXBeamWidth, minusYBeamWidth, plusYBeamWidth, currentX, currentX_Measured, currentY, currentY_Measured, doseRateC, doseRateM, irradiationTime, numSamplesC, numSamplesM, PlanID, IrrSessionID, FieldNum, BeamSize, Status, CumMU, TempCM, PressureCM
0 Karma

HLVarian
Path Finder

I don't have enough Karma points to provide attachments yet. However this is a sample of the data. Data comes in on a CSV file and looks just like this if opened in text:

 #PlanID: '1.2.246.352.71.5.546855675324.46054.20150723173433', IrrSessionID: '1.2.246.352.82.6.4916826480507715613.151884354515765536450', FieldNum:1
#BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 26940.399953
#Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
#VALUES;;;
LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM
1;180.139000;304359.803029;AUG 03 2015 16:36:43 GMT;46.599998;-65.599998;-49.200001;46.599998;1;88716.000000;132111.000000;0.000000;0.000000;46.398749;4643.000000;46.342000;4487.000000;5253;5445;4431;3869;-148038;-148074;-75555;-75703;304359.803029;308679.237710;0.008752;100525;100528
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;57.400002;-65.599998;49.000000;1;166233.000000;121459.000000;0.000000;0.000000;49.026979;4906.000000;49.523042;4795.000000;5272;5479;3842;4047;128182;126383;-99672;-99767;264168.273430;269533.829245;0.010886;11063;11063
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-57.400002;-41.000000;46.400001;1;93767.000000;138361.000000;0.000000;0.000000;46.398749;4643.000000;46.620857;4514.000000;5409;5458;3862;4404;-128187;-61199;-62274;-63445;260916.063907;265030.589047;0.010261;202;202
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-65.599998;-57.400002;45.799999;1;88777.000000;126961.000000;0.000000;0.000000;45.799153;4583.000000;45.897893;4444.000000;5298;5481;4166;4455;-146450;-141625;-87204;-83594;261488.078635;264819.896647;0.010151;71;71

Hope this helps others out.

0 Karma

HLVarian
Path Finder

I've had success with this RegEx in search, now I just need to figure out how to format it for props.conf and transforms.conf

 rex field=_raw "(?m)^\#PlanID: '(?<PlanID>[.?\d]+)\', IrrSessionID: '(?<IrrSessionID>[.?\d]+)\', FieldNum:(?<FieldNum>[.?\d].*)$
^\#BeamSizeID:'(?<BeamSizeID>[.?\d]+)\', Status (?<Status>\([^(]*\))\, CumMU: (?<CumMU>[.?\d]+.*)$
^\#Temp C\/M(?<TempCM>\([^(]*\))\, Pressure C\/M(?<PressureCM>\([^(]*\))$
^\#VALUES;;;$
^LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM$
^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"  
| table _time, LayerNr,Energy,DoseRate,StartIrradiation,Cumm_MU,X,Y,MU,DiagnosticDataValid,X_Measured,Y_Measured,X_MeasuredRaw,Y_MeasuredRaw,Mu_MeasuredC,Mu_MeasuredCt,Mu_MeasuredM,Mu_MeasuredMt,minusXBeamWidth,plusXBeamWidth,minusYBeamWidth,plusYBeamWidth,currentX,currentX_Measured,currentY,currentY_Measured,doseRateC,doseRateM,irradiationTime,numSamplesC,numSamplesM,PlanID,IrrSessionID,FieldNum,BeamSize,Status,CumMU,TempCM,PressureCM
0 Karma

HLVarian
Path Finder

Correction, I HAD success with this search. But it now fails.

0 Karma

stephanefotso
Motivator

Hello! Let's get a sample of your data please.

SGF
0 Karma

HLVarian
Path Finder

I, FINALLY, figured it out. Thanks for trying.

0 Karma

HLVarian
Path Finder

I don't have enough Karma points to post an attachment, but see the "answer" I posted below when i was trying to add an attachment for a snapshot of what the data looks like.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...