<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to extract fields from a multiline header followed by structured data columns? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210967#M61688</link>
    <description>&lt;P&gt;I don't have enough Karma points to post an attachment, but see the "answer" I posted below when i was trying to add an attachment for a snapshot of what the data looks like.&lt;/P&gt;</description>
    <pubDate>Tue, 23 Feb 2016 19:04:03 GMT</pubDate>
    <dc:creator>HLVarian</dc:creator>
    <dc:date>2016-02-23T19:04:03Z</dc:date>
    <item>
      <title>How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210964#M61685</link>
      <description>&lt;P&gt;I have a sourcetype that is in CSV format and I'd like to extract fields from the multiline header that proceeds these files coming in.  Each new line in the header begins with &lt;CODE&gt;#&lt;/CODE&gt; and these lines are comma separated.  This header is followed by the actual data fields which are semi-colon separated.   I'm trying to figure out the best way to go about parsing this data. The data looks like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; #PlanID: '1.2.246.352.71.5.459020837699.2820.20131008220539', IrrSessionID: '1.2.246.352.82.6.5130518203565855886.177214397189262813860', FieldNum:1
 #BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 109720.889806
 #Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
 #VALUES;;;
 Field1;Field2;Field3;Field4;...;FieldN
 Data1;Data2;Data4;Data4;...;DataN
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I'd like to get out the following fields and what they would be relate to in this example:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;PlanID = 1.2.246.352.71.5.459020837699.2820.20131008220539
IrrSessionID = 1.2.246.352.82.6.5130518203565855886.177214397189262813860
FieldNum = 1
BeamSizeID = 4.0
Status = (1, 0, 0)
CumMu = 109720.889806
Temp C/M = (0.000000,0.000000)
Pressure C/M= (0.000000,0.000000)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Followed by:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Field1, Field2,..., etc.
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Previously I just had Splunk strip away all the lines beginning with # using:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;FIELD DELIMITER = ;
PREAMBLE REGEX = ^\#
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;(this works well for all my other sourcetypes)&lt;/P&gt;

&lt;P&gt;I have also used a Python script to append the necessary fields to file and then I can keep using the aforementioned settings.  However, we do not want to run a separate process on our files before getting them into Splunk.&lt;/P&gt;

&lt;P&gt;So, what is the best way to get these fields in?  Is there a way to manipulate my Python script and use it in Splunk on my incoming data?  Or should I use some extensive RegEx in  &lt;CODE&gt;props.conf&lt;/CODE&gt;  and  &lt;CODE&gt;transforms.conf&lt;/CODE&gt;  to achieve this?&lt;/P&gt;

&lt;P&gt;Many thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 22 Feb 2016 19:00:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210964#M61685</guid>
      <dc:creator>HLVarian</dc:creator>
      <dc:date>2016-02-22T19:00:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210965#M61686</link>
      <description>&lt;P&gt;Hello! Let's get a sample of your data please.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Feb 2016 14:18:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210965#M61686</guid>
      <dc:creator>stephanefotso</dc:creator>
      <dc:date>2016-02-23T14:18:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210966#M61687</link>
      <description>&lt;P&gt;I don't have enough Karma points to provide attachments yet.  However this is a sample of the data. Data comes in on a CSV file and looks just like this if opened in text:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; #PlanID: '1.2.246.352.71.5.546855675324.46054.20150723173433', IrrSessionID: '1.2.246.352.82.6.4916826480507715613.151884354515765536450', FieldNum:1
#BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 26940.399953
#Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
#VALUES;;;
LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM
1;180.139000;304359.803029;AUG 03 2015 16:36:43 GMT;46.599998;-65.599998;-49.200001;46.599998;1;88716.000000;132111.000000;0.000000;0.000000;46.398749;4643.000000;46.342000;4487.000000;5253;5445;4431;3869;-148038;-148074;-75555;-75703;304359.803029;308679.237710;0.008752;100525;100528
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;57.400002;-65.599998;49.000000;1;166233.000000;121459.000000;0.000000;0.000000;49.026979;4906.000000;49.523042;4795.000000;5272;5479;3842;4047;128182;126383;-99672;-99767;264168.273430;269533.829245;0.010886;11063;11063
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-57.400002;-41.000000;46.400001;1;93767.000000;138361.000000;0.000000;0.000000;46.398749;4643.000000;46.620857;4514.000000;5409;5458;3862;4404;-128187;-61199;-62274;-63445;260916.063907;265030.589047;0.010261;202;202
2;176.639000;261391.348040;AUG 03 2015 16:36:53 GMT;262.600000;-65.599998;-57.400002;45.799999;1;88777.000000;126961.000000;0.000000;0.000000;45.799153;4583.000000;45.897893;4444.000000;5298;5481;4166;4455;-146450;-141625;-87204;-83594;261488.078635;264819.896647;0.010151;71;71
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hope this helps others out.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Feb 2016 19:02:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210966#M61687</guid>
      <dc:creator>HLVarian</dc:creator>
      <dc:date>2016-02-23T19:02:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210967#M61688</link>
      <description>&lt;P&gt;I don't have enough Karma points to post an attachment, but see the "answer" I posted below when i was trying to add an attachment for a snapshot of what the data looks like.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Feb 2016 19:04:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210967#M61688</guid>
      <dc:creator>HLVarian</dc:creator>
      <dc:date>2016-02-23T19:04:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210968#M61689</link>
      <description>&lt;P&gt;I've had success with this RegEx in search, now I just need to figure out how to format it for &lt;CODE&gt;props.conf&lt;/CODE&gt; and &lt;CODE&gt;transforms.conf&lt;/CODE&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; rex field=_raw "(?m)^\#PlanID: '(?&amp;lt;PlanID&amp;gt;[.?\d]+)\', IrrSessionID: '(?&amp;lt;IrrSessionID&amp;gt;[.?\d]+)\', FieldNum:(?&amp;lt;FieldNum&amp;gt;[.?\d].*)$
^\#BeamSizeID:'(?&amp;lt;BeamSizeID&amp;gt;[.?\d]+)\', Status (?&amp;lt;Status&amp;gt;\([^(]*\))\, CumMU: (?&amp;lt;CumMU&amp;gt;[.?\d]+.*)$
^\#Temp C\/M(?&amp;lt;TempCM&amp;gt;\([^(]*\))\, Pressure C\/M(?&amp;lt;PressureCM&amp;gt;\([^(]*\))$
^\#VALUES;;;$
^LayerNr;Energy;DoseRate;Start-Irradiation;Cumm MU;X;Y;MU;DiagnosticDataValid;X_Measured;Y_Measured;X_MeasuredRaw;Y_MeasuredRaw;Mu_MeasuredC; Mu_MeasuredCt;Mu_MeasuredM;Mu_MeasuredMt;minusXBeamWidth;plusXBeamWidth;minusYBeamWidth;plusYBeamWidth;currentX;currentX_Measured;currentY;currentY_Measured;doseRateC;doseRateM;irradiationTime;numSamplesC;numSamplesM$
^(?&amp;lt;LayerNr&amp;gt;[^\;*]*);(?&amp;lt;Energy&amp;gt;[^\;*]*);(?&amp;lt;DoseRate&amp;gt;[^\;*]*);(?&amp;lt;StartIrradiation&amp;gt;[^\;*]*);(?&amp;lt;Cumm_MU&amp;gt;[^\;*]*);(?&amp;lt;X&amp;gt;[^\;*]*);(?&amp;lt;Y&amp;gt;[^\;*]*);(?&amp;lt;MU&amp;gt;[^\;*]*);(?&amp;lt;DiagnosticDataValid&amp;gt;[^\;*]*);(?&amp;lt;X_Measured&amp;gt;[^\;*]*);(?&amp;lt;Y_Measured&amp;gt;[^\;*]*);(?&amp;lt;X_MeasuredRaw&amp;gt;[^\;*]*);(?&amp;lt;Y_MeasuredRaw&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredC&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredCt&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredM&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredMt&amp;gt;[^\;*]*);(?&amp;lt;minusXBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;plusXBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;minusYBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;plusYBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;currentX&amp;gt;[^\;*]*);(?&amp;lt;currentX_Measured&amp;gt;[^\;*]*);(?&amp;lt;currentY&amp;gt;[^\;*]*);(?&amp;lt;currentY_Measured&amp;gt;[^\;*]*);(?&amp;lt;doseRateC&amp;gt;[^\;*]*);(?&amp;lt;doseRateM&amp;gt;[^\;*]*);(?&amp;lt;irradiationTime&amp;gt;[^\;*]*);(?&amp;lt;numSamplesC&amp;gt;[^\;*]*);(?&amp;lt;numSamplesM&amp;gt;[^;].*)$"  
| table _time, LayerNr,Energy,DoseRate,StartIrradiation,Cumm_MU,X,Y,MU,DiagnosticDataValid,X_Measured,Y_Measured,X_MeasuredRaw,Y_MeasuredRaw,Mu_MeasuredC,Mu_MeasuredCt,Mu_MeasuredM,Mu_MeasuredMt,minusXBeamWidth,plusXBeamWidth,minusYBeamWidth,plusYBeamWidth,currentX,currentX_Measured,currentY,currentY_Measured,doseRateC,doseRateM,irradiationTime,numSamplesC,numSamplesM,PlanID,IrrSessionID,FieldNum,BeamSize,Status,CumMU,TempCM,PressureCM
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 23 Feb 2016 22:30:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210968#M61689</guid>
      <dc:creator>HLVarian</dc:creator>
      <dc:date>2016-02-23T22:30:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210969#M61690</link>
      <description>&lt;P&gt;Correction, I HAD success with this search.  But it now fails.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Feb 2016 18:03:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210969#M61690</guid>
      <dc:creator>HLVarian</dc:creator>
      <dc:date>2016-02-24T18:03:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210970#M61691</link>
      <description>&lt;P&gt;It looks like the key to this one was to create eventtypes.  One based on the header and the other the data.  I then ran a search on the eventtype=IrrHeader using a regex.  I then appended a search on eventtype=IrrData.  From here I created my table.  &lt;/P&gt;

&lt;P&gt;This was the final search:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eventtype=IrrHeader 
| rex field=_raw "(?ms)^[^:\n]*:\s+'(?P&amp;lt;PlanID&amp;gt;[^']+)[^:\n]*:\s+'(?P&amp;lt;IrrSessionID&amp;gt;[^']+)[^:\n]*:(?P&amp;lt;FieldNum&amp;gt;\d+)\s+#\w+:'(?P&amp;lt;BeamSizeID&amp;gt;\d+\.\d+)',\s+\w+\s+(?P&amp;lt;Status&amp;gt;\(\d+,\s+\d+,\s+\d+\))[^:\n]*:\s+(?P&amp;lt;CumMU&amp;gt;\d+\.\d+)\s+#\w+\s+\w+\/\w+(?P&amp;lt;TempCM&amp;gt;\(\d+\.\d+,\d+\.\d+\)),\s+\w+\s+\w+\/\w+(?P&amp;lt;PressureCM&amp;gt;\(\d+\.\d+,\d+\.\d+\))" 
| append [search sourcetype=Irr2File eventtype=IrrData 
| rex field=_raw "^(?&amp;lt;LayerNr&amp;gt;[^\;*]*);(?&amp;lt;Energy&amp;gt;[^\;*]*);(?&amp;lt;DoseRate&amp;gt;[^\;*]*);(?&amp;lt;StartIrradiation&amp;gt;[^\;*]*);(?&amp;lt;Cumm_MU&amp;gt;[^\;*]*);(?&amp;lt;X&amp;gt;[^\;*]*);(?&amp;lt;Y&amp;gt;[^\;*]*);(?&amp;lt;MU&amp;gt;[^\;*]*);(?&amp;lt;DiagnosticDataValid&amp;gt;[^\;*]*);(?&amp;lt;X_Measured&amp;gt;[^\;*]*);(?&amp;lt;Y_Measured&amp;gt;[^\;*]*);(?&amp;lt;X_MeasuredRaw&amp;gt;[^\;*]*);(?&amp;lt;Y_MeasuredRaw&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredC&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredCt&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredM&amp;gt;[^\;*]*);(?&amp;lt;Mu_MeasuredMt&amp;gt;[^\;*]*);(?&amp;lt;minusXBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;plusXBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;minusYBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;plusYBeamWidth&amp;gt;[^\;*]*);(?&amp;lt;currentX&amp;gt;[^\;*]*);(?&amp;lt;currentX_Measured&amp;gt;[^\;*]*);(?&amp;lt;currentY&amp;gt;[^\;*]*);(?&amp;lt;currentY_Measured&amp;gt;[^\;*]*);(?&amp;lt;doseRateC&amp;gt;[^\;*]*);(?&amp;lt;doseRateM&amp;gt;[^\;*]*);(?&amp;lt;irradiationTime&amp;gt;[^\;*]*);(?&amp;lt;numSamplesC&amp;gt;[^\;*]*);(?&amp;lt;numSamplesM&amp;gt;[^;].*)$"]
| table _time, LayerNr, Energy, DoseRate, StartIrradiation, Cumm_MU, X, Y, MU, DiagnosticDataValid, X_Measured, Y_Measured, X_MeasuredRaw, Y_MeasuredRaw, Mu_MeasuredC, Mu_MeasuredCt, Mu_MeasuredM, Mu_MeasuredMt, minusXBeamWidth, plusXBeamWidth, minusYBeamWidth, plusYBeamWidth, currentX, currentX_Measured, currentY, currentY_Measured, doseRateC, doseRateM, irradiationTime, numSamplesC, numSamplesM, PlanID, IrrSessionID, FieldNum, BeamSize, Status, CumMU, TempCM, PressureCM
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 26 Feb 2016 00:59:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210970#M61691</guid>
      <dc:creator>HLVarian</dc:creator>
      <dc:date>2016-02-26T00:59:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract fields from a multiline header followed by structured data columns?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210971#M61692</link>
      <description>&lt;P&gt;I, FINALLY, figured it out.  Thanks for trying.  &lt;/P&gt;</description>
      <pubDate>Fri, 26 Feb 2016 01:12:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-fields-from-a-multiline-header-followed-by/m-p/210971#M61692</guid>
      <dc:creator>HLVarian</dc:creator>
      <dc:date>2016-02-26T01:12:27Z</dc:date>
    </item>
  </channel>
</rss>

