I have a sourcetype that is in CSV format and I'd like to extract fields from the multiline header that proceeds these files coming in. Each new line in the header begins with #
and these lines are comma separated. This header is followed by the actual data fields which are semi-colon separated. I'm trying to figure out the best way to go about parsing this data. The data looks like this:
#PlanID: '1.2.246.352.71.5.459020837699.2820.20131008220539', IrrSessionID: '1.2.246.352.82.6.5130518203565855886.177214397189262813860', FieldNum:1
#BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 109720.889806
#Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
#VALUES;;;
Field1;Field2;Field3;Field4;...;FieldN
Data1;Data2;Data4;Data4;...;DataN
I'd like to get out the following fields and what they would be relate to in this example:
PlanID = 1.2.246.352.71.5.459020837699.2820.20131008220539
IrrSessionID = 1.2.246.352.82.6.5130518203565855886.177214397189262813860
FieldNum = 1
BeamSizeID = 4.0
Status = (1, 0, 0)
CumMu = 109720.889806
Temp C/M = (0.000000,0.000000)
Pressure C/M= (0.000000,0.000000)
Followed by:
Field1, Field2,..., etc.
Previously I just had Splunk strip away all the lines beginning with # using:
FIELD DELIMITER = ;
PREAMBLE REGEX = ^\#
(this works well for all my other sourcetypes)
I have also used a Python script to append the necessary fields to file and then I can keep using the aforementioned settings. However, we do not want to run a separate process on our files before getting them into Splunk.
So, what is the best way to get these fields in? Is there a way to manipulate my Python script and use it in Splunk on my incoming data? Or should I use some extensive RegEx in props.conf
and transforms.conf
to achieve this?
Many thanks!
It looks like the key to this one was to create eventtypes. One based on the header and the other the data. I then ran a search on the eventtype=IrrHeader using a regex. I then appended a search on eventtype=IrrData. From here I created my table.
This was the final search:
eventtype=IrrHeader
| rex field=_raw "(?ms)^[^:\n]*:\s+'(?P<PlanID>[^']+)[^:\n]*:\s+'(?P<IrrSessionID>[^']+)[^:\n]*:(?P<FieldNum>\d+)\s+#\w+:'(?P<BeamSizeID>\d+\.\d+)',\s+\w+\s+(?P<Status>\(\d+,\s+\d+,\s+\d+\))[^:\n]*:\s+(?P<CumMU>\d+\.\d+)\s+#\w+\s+\w+\/\w+(?P<TempCM>\(\d+\.\d+,\d+\.\d+\)),\s+\w+\s+\w+\/\w+(?P<PressureCM>\(\d+\.\d+,\d+\.\d+\))"
| append [search sourcetype=Irr2File eventtype=IrrData
| rex field=_raw "^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"]
| table _time, LayerNr, Energy, DoseRate, StartIrradiation, Cumm_MU, X, Y, MU, DiagnosticDataValid, X_Measured, Y_Measured, X_MeasuredRaw, Y_MeasuredRaw, Mu_MeasuredC, Mu_MeasuredCt, Mu_MeasuredM, Mu_MeasuredMt, minusXBeamWidth, plusXBeamWidth, minusYBeamWidth, plusYBeamWidth, currentX, currentX_Measured, currentY, currentY_Measured, doseRateC, doseRateM, irradiationTime, numSamplesC, numSamplesM, PlanID, IrrSessionID, FieldNum, BeamSize, Status, CumMU, TempCM, PressureCM
It looks like the key to this one was to create eventtypes. One based on the header and the other the data. I then ran a search on the eventtype=IrrHeader using a regex. I then appended a search on eventtype=IrrData. From here I created my table.
This was the final search:
eventtype=IrrHeader
| rex field=_raw "(?ms)^[^:\n]*:\s+'(?P<PlanID>[^']+)[^:\n]*:\s+'(?P<IrrSessionID>[^']+)[^:\n]*:(?P<FieldNum>\d+)\s+#\w+:'(?P<BeamSizeID>\d+\.\d+)',\s+\w+\s+(?P<Status>\(\d+,\s+\d+,\s+\d+\))[^:\n]*:\s+(?P<CumMU>\d+\.\d+)\s+#\w+\s+\w+\/\w+(?P<TempCM>\(\d+\.\d+,\d+\.\d+\)),\s+\w+\s+\w+\/\w+(?P<PressureCM>\(\d+\.\d+,\d+\.\d+\))"
| append [search sourcetype=Irr2File eventtype=IrrData
| rex field=_raw "^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"]
| table _time, LayerNr, Energy, DoseRate, StartIrradiation, Cumm_MU, X, Y, MU, DiagnosticDataValid, X_Measured, Y_Measured, X_MeasuredRaw, Y_MeasuredRaw, Mu_MeasuredC, Mu_MeasuredCt, Mu_MeasuredM, Mu_MeasuredMt, minusXBeamWidth, plusXBeamWidth, minusYBeamWidth, plusYBeamWidth, currentX, currentX_Measured, currentY, currentY_Measured, doseRateC, doseRateM, irradiationTime, numSamplesC, numSamplesM, PlanID, IrrSessionID, FieldNum, BeamSize, Status, CumMU, TempCM, PressureCM
I don't have enough Karma points to provide attachments yet. However this is a sample of the data. Data comes in on a CSV file and looks just like this if opened in text:
#PlanID: '1.2.246.352.71.5.546855675324.46054.20150723173433', IrrSessionID: '1.2.246.352.82.6.4916826480507715613.151884354515765536450', FieldNum:1
#BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 26940.399953
#Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
#VALUES;;;
LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM
1;180.139000;304359.803029;AUG 03 2015 16:36:43 GMT;46.599998;-65.599998;-49.200001;46.599998;1;88716.000000;132111.000000;0.000000;0.000000;46.398749;4643.000000;46.342000;4487.000000;5253;5445;4431;3869;-148038;-148074;-75555;-75703;304359.803029;308679.237710;0.008752;100525;100528
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;57.400002;-65.599998;49.000000;1;166233.000000;121459.000000;0.000000;0.000000;49.026979;4906.000000;49.523042;4795.000000;5272;5479;3842;4047;128182;126383;-99672;-99767;264168.273430;269533.829245;0.010886;11063;11063
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-57.400002;-41.000000;46.400001;1;93767.000000;138361.000000;0.000000;0.000000;46.398749;4643.000000;46.620857;4514.000000;5409;5458;3862;4404;-128187;-61199;-62274;-63445;260916.063907;265030.589047;0.010261;202;202
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-65.599998;-57.400002;45.799999;1;88777.000000;126961.000000;0.000000;0.000000;45.799153;4583.000000;45.897893;4444.000000;5298;5481;4166;4455;-146450;-141625;-87204;-83594;261488.078635;264819.896647;0.010151;71;71
Hope this helps others out.
I've had success with this RegEx in search, now I just need to figure out how to format it for props.conf
and transforms.conf
rex field=_raw "(?m)^\#PlanID: '(?<PlanID>[.?\d]+)\', IrrSessionID: '(?<IrrSessionID>[.?\d]+)\', FieldNum:(?<FieldNum>[.?\d].*)$
^\#BeamSizeID:'(?<BeamSizeID>[.?\d]+)\', Status (?<Status>\([^(]*\))\, CumMU: (?<CumMU>[.?\d]+.*)$
^\#Temp C\/M(?<TempCM>\([^(]*\))\, Pressure C\/M(?<PressureCM>\([^(]*\))$
^\#VALUES;;;$
^LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM$
^(?<LayerNr>[^\;*]*);(?<Energy>[^\;*]*);(?<DoseRate>[^\;*]*);(?<StartIrradiation>[^\;*]*);(?<Cumm_MU>[^\;*]*);(?<X>[^\;*]*);(?<Y>[^\;*]*);(?<MU>[^\;*]*);(?<DiagnosticDataValid>[^\;*]*);(?<X_Measured>[^\;*]*);(?<Y_Measured>[^\;*]*);(?<X_MeasuredRaw>[^\;*]*);(?<Y_MeasuredRaw>[^\;*]*);(?<Mu_MeasuredC>[^\;*]*);(?<Mu_MeasuredCt>[^\;*]*);(?<Mu_MeasuredM>[^\;*]*);(?<Mu_MeasuredMt>[^\;*]*);(?<minusXBeamWidth>[^\;*]*);(?<plusXBeamWidth>[^\;*]*);(?<minusYBeamWidth>[^\;*]*);(?<plusYBeamWidth>[^\;*]*);(?<currentX>[^\;*]*);(?<currentX_Measured>[^\;*]*);(?<currentY>[^\;*]*);(?<currentY_Measured>[^\;*]*);(?<doseRateC>[^\;*]*);(?<doseRateM>[^\;*]*);(?<irradiationTime>[^\;*]*);(?<numSamplesC>[^\;*]*);(?<numSamplesM>[^;].*)$"
| table _time, LayerNr,Energy,DoseRate,StartIrradiation,Cumm_MU,X,Y,MU,DiagnosticDataValid,X_Measured,Y_Measured,X_MeasuredRaw,Y_MeasuredRaw,Mu_MeasuredC,Mu_MeasuredCt,Mu_MeasuredM,Mu_MeasuredMt,minusXBeamWidth,plusXBeamWidth,minusYBeamWidth,plusYBeamWidth,currentX,currentX_Measured,currentY,currentY_Measured,doseRateC,doseRateM,irradiationTime,numSamplesC,numSamplesM,PlanID,IrrSessionID,FieldNum,BeamSize,Status,CumMU,TempCM,PressureCM
Correction, I HAD success with this search. But it now fails.
Hello! Let's get a sample of your data please.
I, FINALLY, figured it out. Thanks for trying.
I don't have enough Karma points to post an attachment, but see the "answer" I posted below when i was trying to add an attachment for a snapshot of what the data looks like.