Hi All,
Just curious about the best method to index a CSV file with multiple sets of data inside?
The basic format of the whole file is
I,DataSet1_FieldName1,DataSet1_FieldName2,DataSet1_FieldName3
D,this,54,fred
D,this,87,barry
I,DataSet2_FieldName1,DataSet2_FieldName2,DataSet2_FieldName3
D,784,moreInfo,thatData
D,5443,moreInfo2,thisData
D,524,moreInfo2,theOtherData
I,DataSet3_FieldName1,DataSet3_FieldName2
D,Wow,SoMuchData
D,Really,MoreData
and on and on it goes with about 5 sets of data.
As you can see the first field of each row determines if the row is a header/index row or a data row
I = Index/Header row
D = Data row
We have successfully been able to retrieve only ONE set of data by using the following
inputs.conf
[monitor:///tmp/csvProvider]<br />
disabled = false<br />
followTail = 0<br />
host = csvProvider<br />
sourcetype = public_data
props.conf
[public_data]<br />
KV_MODE = none<br />
SHOULD_LINEMERGE = false<br />
TRANSFORMS-filterprices = setnull,getprices<br />
REPORT-extracts = csv_extract<br />
transforms.conf
[setnull]<br />
REGEX = .<br />
DEST_KEY = queue<br />
FORMAT = nullQueue<br />
<br />
[getprices]<br />
REGEX = ^D,DISPATCH,PRICE,(.*)<br />
DEST_KEY = queue<br />
FORMAT = indexQueue<br />
<br />
[csv_extract]<br />
DELIMS = ","<br />
FIELDS = "I","DISPATCH","PRICE","THREE","SETTLEMENTDATE","RUNNO","REGIONID","DISPATCHINTERVAL","INTERVENTION","RRP"<br />
Does anyone know the best method to get all sets of data out?
Secondly the files we retrieve are actually .zip files. Currently we are extracting them before handing them off to splunk. Is there a way to get splunk to extract and process these files in the same manner as above? We previously tried to get splunk to process the zip files but it didn't seem to handle them very well at all.
Thanks
I cracked it... not ideal but it works.
I am sure there is probably a better way to "repeat" that regular expression pattern but it works!!!!!
As you can see the first entry of FORMAT for each REPORT sets the "type" of data to allow you to search for a specific set.
INTERCONNECTOR_SOLN::1
allows for
sourcetype=predispatchis INTERCONNECTOR_SOLN=1
Now while I would much prefer to have different sourcetypes for each of these this is the next best solution.
props.conf
[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX=\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}
TRANSFORMS-a-filterprices = setnull,PREDISPATCH-getInterconnector,PREDISPATCH-getRegionSolution,PREDISPATCH-getRegionPrices
REPORT-extract_regionSolution = PREDISPATCH-regionSolutionCsvExtract
REPORT-extract_interconnect = PREDISPATCH-interconnectorSolnCsvExtract
REPORT-extract_regionPrices = PREDISPATCH-regionPricesCsvExtract
transforms.conf
[PREDISPATCH-getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[PREDISPATCH-getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[PREDISPATCH-getRegionPrices]
REGEX = ^D,PREDISPATCH,REGION_PRICES,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[PREDISPATCH-interconnectorSolnCsvExtract]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = INTERCONNECTOR_SOLN::1 PREDISPATCHSEQNO::$2 RUNNO::$3 INTERCONNECTORID::$4 PERIODID::$5 INTERVENTION::$6 METEREDMWFLOW::$7 MWFLOW::$8 MWLOSSES::$9 MARGINALVALUE::$10 VIOLATIONDEGREE::$11 LASTCHANGED::$12 DATETIME::$13 EXPORTLIMIT::$14 IMPORTLIMIT::$15 MARGINALLOSS::$16 EXPORTGENCONID::$17 IMPORTGENCONID::$18 FCASEXPORTLIMIT::$19 FCASIMPORTLIMIT::$20 LOCAL_PRICE_ADJUSTMENT_EXPORT::$21 LOCALLY_CONSTRAINED_EXPORT::$22 LOCAL_PRICE_ADJUSTMENT_IMPORT::$23 LOCALLY_CONSTRAINED_IMPORT::$24
[PREDISPATCH-regionSolutionCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_SOLUTION::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 TOTALDEMAND::$7 AVAILABLEGENERATION::$8 AVAILABLELOAD::$9 DEMANDFORECAST::$10 DISPATCHABLEGENERATION::$11 DISPATCHABLELOAD::$12 NETINTERCHANGE::$13 EXCESSGENERATION::$14 LOWER5MINDISPATCH::$15 LOWER5MINIMPORT::$16 LOWER5MINLOCALDISPATCH::$17 LOWER5MINLOCALPRICE::$18 LOWER5MINLOCALREQ::$19 LOWER5MINPRICE::$20 LOWER5MINREQ::$21 LOWER5MINSUPPLYPRICE::$22 LOWER60SECDISPATCH::$23 LOWER60SECIMPORT::$24 LOWER60SECLOCALDISPATCH::$25 LOWER60SECLOCALPRICE::$26 LOWER60SECLOCALREQ::$27 LOWER60SECPRICE::$28 LOWER60SECREQ::$29 LOWER60SECSUPPLYPRICE::$30 LOWER6SECDISPATCH::$31 LOWER6SECIMPORT::$32 LOWER6SECLOCALDISPATCH::$33 LOWER6SECLOCALPRICE::$34 LOWER6SECLOCALREQ::$35 LOWER6SECPRICE::$36 LOWER6SECREQ::$37 LOWER6SECSUPPLYPRICE::$38 RAISE5MINDISPATCH::$39 RAISE5MINIMPORT::$40 RAISE5MINLOCALDISPATCH::$41 RAISE5MINLOCALPRICE::$42 RAISE5MINLOCALREQ::$43 RAISE5MINPRICE::$44 RAISE5MINREQ::$45 RAISE5MINSUPPLYPRICE::$46 RAISE60SECDISPATCH::$47 RAISE60SECIMPORT::$48 RAISE60SECLOCALDISPATCH::$49 RAISE60SECLOCALPRICE::$50 RAISE60SECLOCALREQ::$51 RAISE60SECPRICE::$52 RAISE60SECREQ::$53 RAISE60SECSUPPLYPRICE::$54 RAISE6SECDISPATCH::$55 RAISE6SECIMPORT::$56 RAISE6SECLOCALDISPATCH::$57 RAISE6SECLOCALPRICE::$58 RAISE6SECLOCALREQ::$59 RAISE6SECPRICE::$60 RAISE6SECREQ::$61 RAISE6SECSUPPLYPRICE::$62 LASTCHANGED::$63 DATETIME::$64 INITIALSUPPLY::$65 CLEAREDSUPPLY::$66 LOWERREGIMPORT::$67 LOWERREGLOCALDISPATCH::$68 LOWERREGLOCALREQ::$69 LOWERREGREQ::$70 RAISEREGIMPORT::$71 RAISEREGLOCALDISPATCH::$72 RAISEREGLOCALREQ::$73 RAISEREGREQ::$74 RAISE5MINLOCALVIOLATION::$75 RAISEREGLOCALVIOLATION::$76 RAISE60SECLOCALVIOLATION::$77 RAISE6SECLOCALVIOLATION::$78 LOWER5MINLOCALVIOLATION::$79 LOWERREGLOCALVIOLATION::$80 LOWER60SECLOCALVIOLATION::$81 LOWER6SECLOCALVIOLATION::$82 RAISE5MINVIOLATION::$83 RAISEREGVIOLATION::$84 RAISE60SECVIOLATION::$85 RAISE6SECVIOLATION::$86 LOWER5MINVIOLATION::$87 LOWERREGVIOLATION::$88 LOWER60SECVIOLATION::$89 LOWER6SECVIOLATION::$90 RAISE6SECACTUALAVAILABILITY::$91 RAISE60SECACTUALAVAILABILITY::$92 RAISE5MINACTUALAVAILABILITY::$93 RAISEREGACTUALAVAILABILITY::$94 LOWER6SECACTUALAVAILABILITY::$95 LOWER60SECACTUALAVAILABILITY::$96 LOWER5MINACTUALAVAILABILITY::$97 LOWERREGACTUALAVAILABILITY::$98 DECAVAILABILITY::$99 LORSURPLUS::$100 LRCSURPLUS::$101 TOTALINTERMITTENTGENERATION::$102 DEMAND_AND_NONSCHEDGEN::$103 UIGF::$104 SEMISCHEDULE_CLEAREDMW::$105 SEMISCHEDULE_COMPLIANCEMW::$106
[PREDISPATCH-regionPricesCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_PRICES,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_PRICES::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 RRP::$7 EEP::$8 RRP1::$9 EEP1::$10 RRP2::$11 EEP2::$12 RRP3::$13 EEP3::$14 RRP4::$15 EEP4::$16 RRP5::$17 EEP5::$18 RRP6::$19 EEP6::$20 RRP7::$21 EEP7::$22 RRP8::$23 EEP8::$24 LASTCHANGED::$25 DATETIME::$26 RAISE6SECRRP::$27 RAISE60SECRRP::$28 RAISE5MINRRP::$29 RAISEREGRRP::$30 LOWER6SECRRP::$31 LOWER60SECRRP::$32 LOWER5MINRRP::$33 LOWERREGRRP::$34
I cracked it... not ideal but it works.
I am sure there is probably a better way to "repeat" that regular expression pattern but it works!!!!!
As you can see the first entry of FORMAT for each REPORT sets the "type" of data to allow you to search for a specific set.
INTERCONNECTOR_SOLN::1
allows for
sourcetype=predispatchis INTERCONNECTOR_SOLN=1
Now while I would much prefer to have different sourcetypes for each of these this is the next best solution.
props.conf
[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX=\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}
TRANSFORMS-a-filterprices = setnull,PREDISPATCH-getInterconnector,PREDISPATCH-getRegionSolution,PREDISPATCH-getRegionPrices
REPORT-extract_regionSolution = PREDISPATCH-regionSolutionCsvExtract
REPORT-extract_interconnect = PREDISPATCH-interconnectorSolnCsvExtract
REPORT-extract_regionPrices = PREDISPATCH-regionPricesCsvExtract
transforms.conf
[PREDISPATCH-getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[PREDISPATCH-getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[PREDISPATCH-getRegionPrices]
REGEX = ^D,PREDISPATCH,REGION_PRICES,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[PREDISPATCH-interconnectorSolnCsvExtract]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = INTERCONNECTOR_SOLN::1 PREDISPATCHSEQNO::$2 RUNNO::$3 INTERCONNECTORID::$4 PERIODID::$5 INTERVENTION::$6 METEREDMWFLOW::$7 MWFLOW::$8 MWLOSSES::$9 MARGINALVALUE::$10 VIOLATIONDEGREE::$11 LASTCHANGED::$12 DATETIME::$13 EXPORTLIMIT::$14 IMPORTLIMIT::$15 MARGINALLOSS::$16 EXPORTGENCONID::$17 IMPORTGENCONID::$18 FCASEXPORTLIMIT::$19 FCASIMPORTLIMIT::$20 LOCAL_PRICE_ADJUSTMENT_EXPORT::$21 LOCALLY_CONSTRAINED_EXPORT::$22 LOCAL_PRICE_ADJUSTMENT_IMPORT::$23 LOCALLY_CONSTRAINED_IMPORT::$24
[PREDISPATCH-regionSolutionCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_SOLUTION::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 TOTALDEMAND::$7 AVAILABLEGENERATION::$8 AVAILABLELOAD::$9 DEMANDFORECAST::$10 DISPATCHABLEGENERATION::$11 DISPATCHABLELOAD::$12 NETINTERCHANGE::$13 EXCESSGENERATION::$14 LOWER5MINDISPATCH::$15 LOWER5MINIMPORT::$16 LOWER5MINLOCALDISPATCH::$17 LOWER5MINLOCALPRICE::$18 LOWER5MINLOCALREQ::$19 LOWER5MINPRICE::$20 LOWER5MINREQ::$21 LOWER5MINSUPPLYPRICE::$22 LOWER60SECDISPATCH::$23 LOWER60SECIMPORT::$24 LOWER60SECLOCALDISPATCH::$25 LOWER60SECLOCALPRICE::$26 LOWER60SECLOCALREQ::$27 LOWER60SECPRICE::$28 LOWER60SECREQ::$29 LOWER60SECSUPPLYPRICE::$30 LOWER6SECDISPATCH::$31 LOWER6SECIMPORT::$32 LOWER6SECLOCALDISPATCH::$33 LOWER6SECLOCALPRICE::$34 LOWER6SECLOCALREQ::$35 LOWER6SECPRICE::$36 LOWER6SECREQ::$37 LOWER6SECSUPPLYPRICE::$38 RAISE5MINDISPATCH::$39 RAISE5MINIMPORT::$40 RAISE5MINLOCALDISPATCH::$41 RAISE5MINLOCALPRICE::$42 RAISE5MINLOCALREQ::$43 RAISE5MINPRICE::$44 RAISE5MINREQ::$45 RAISE5MINSUPPLYPRICE::$46 RAISE60SECDISPATCH::$47 RAISE60SECIMPORT::$48 RAISE60SECLOCALDISPATCH::$49 RAISE60SECLOCALPRICE::$50 RAISE60SECLOCALREQ::$51 RAISE60SECPRICE::$52 RAISE60SECREQ::$53 RAISE60SECSUPPLYPRICE::$54 RAISE6SECDISPATCH::$55 RAISE6SECIMPORT::$56 RAISE6SECLOCALDISPATCH::$57 RAISE6SECLOCALPRICE::$58 RAISE6SECLOCALREQ::$59 RAISE6SECPRICE::$60 RAISE6SECREQ::$61 RAISE6SECSUPPLYPRICE::$62 LASTCHANGED::$63 DATETIME::$64 INITIALSUPPLY::$65 CLEAREDSUPPLY::$66 LOWERREGIMPORT::$67 LOWERREGLOCALDISPATCH::$68 LOWERREGLOCALREQ::$69 LOWERREGREQ::$70 RAISEREGIMPORT::$71 RAISEREGLOCALDISPATCH::$72 RAISEREGLOCALREQ::$73 RAISEREGREQ::$74 RAISE5MINLOCALVIOLATION::$75 RAISEREGLOCALVIOLATION::$76 RAISE60SECLOCALVIOLATION::$77 RAISE6SECLOCALVIOLATION::$78 LOWER5MINLOCALVIOLATION::$79 LOWERREGLOCALVIOLATION::$80 LOWER60SECLOCALVIOLATION::$81 LOWER6SECLOCALVIOLATION::$82 RAISE5MINVIOLATION::$83 RAISEREGVIOLATION::$84 RAISE60SECVIOLATION::$85 RAISE6SECVIOLATION::$86 LOWER5MINVIOLATION::$87 LOWERREGVIOLATION::$88 LOWER60SECVIOLATION::$89 LOWER6SECVIOLATION::$90 RAISE6SECACTUALAVAILABILITY::$91 RAISE60SECACTUALAVAILABILITY::$92 RAISE5MINACTUALAVAILABILITY::$93 RAISEREGACTUALAVAILABILITY::$94 LOWER6SECACTUALAVAILABILITY::$95 LOWER60SECACTUALAVAILABILITY::$96 LOWER5MINACTUALAVAILABILITY::$97 LOWERREGACTUALAVAILABILITY::$98 DECAVAILABILITY::$99 LORSURPLUS::$100 LRCSURPLUS::$101 TOTALINTERMITTENTGENERATION::$102 DEMAND_AND_NONSCHEDGEN::$103 UIGF::$104 SEMISCHEDULE_CLEAREDMW::$105 SEMISCHEDULE_COMPLIANCEMW::$106
[PREDISPATCH-regionPricesCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_PRICES,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_PRICES::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 RRP::$7 EEP::$8 RRP1::$9 EEP1::$10 RRP2::$11 EEP2::$12 RRP3::$13 EEP3::$14 RRP4::$15 EEP4::$16 RRP5::$17 EEP5::$18 RRP6::$19 EEP6::$20 RRP7::$21 EEP7::$22 RRP8::$23 EEP8::$24 LASTCHANGED::$25 DATETIME::$26 RAISE6SECRRP::$27 RAISE60SECRRP::$28 RAISE5MINRRP::$29 RAISEREGRRP::$30 LOWER6SECRRP::$31 LOWER60SECRRP::$32 LOWER5MINRRP::$33 LOWERREGRRP::$34
I've even tried this to just get them into the same sourcetype but with different extractions. In the hope that I can isolate them with a search on a field unique to each different data set.
This works and only pulls in the rows I am interested in however the fields dont extract. 😞
So close.....
inputs.conf
[monitor:///opt/web-data/PreDispatchIS_Reports]
disabled = false
followTail = 0
host = aemo-web
sourcetype = predispatchis
whitelist = PUBLIC_PREDISPATCHIS_[0-9_]*.CSV
crcSalt = <SOURCE>
props.conf
[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX=\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}
TRANSFORMS-a-filterprices = setnull,getInterconnector,getRegionSolution
#TRANSFORMS-z-sourcetype-rewrite=setInterconnectorSource,setRegionSolutionSource
REPORT-extract_region = predispatchisCsvExtract
REPORT-extract_interconnect = interconnectorSolnCsvExtract
transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[interconnectorSolnCsvExtract]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = PREDISPATCHSEQNO::$2 RUNNO::$3 INTERCONNECTORID::$4 PERIODID::$5 INTERVENTION::$6 METEREDMWFLOW::$7
[predispatchisCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 TOTALDEMAND::$7
OK I have encountered this issue again and this time I need to resolve it.
The data is publicly available so I suppose there is no need to keep it private here is an example contents of the CSV file.
or grab a full file to unzip here http://www.nemweb.com.au/REPORTS/CURRENT/PreDispatchIS_Reports/
The first sourcetype I want to call "predispatchis" and matches these formats
D,PREDISPATCH,REGION_SOLUTION,........
I want _time to be the second date in each row
The second sourcetype I want to call "interconnector_soln" and matches these formats
D,PREDISPATCH,INTERCONNECTOR_SOLN,
I want _time to be the second date in each row
So my thinking was have the inputs.conf pulling into a "temp" sourcetype then reassign the sourcetype based on which row matches from the transforms. Then this new sourcetype will (hopefully) find the timestamp and finally a few search time field extractions to get out the fields based on the known format.
Sadly it aint working at the moment and nothing is coming in.
Any assistance would be greatly appreciated.
inputs.conf
[monitor:///opt/web-data/PreDispatchIS_Reports]
disabled = false
followTail = 0
host = aemo-web
sourcetype = predispatchis_init_sourcetype
index=test
whitelist = PUBLIC_PREDISPATCHIS_[0-9_]*.CSV
crcSalt = <SOURCE>
props.conf
[predispatchis_init_sourcetype]
KV_MODE = none
SHOULD_LINEMERGE = false
TRANSFORMS-filterprices = setnull,getInterconnector,getRegionSolution
[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX = ^D,PREDISPATCH,REGION_SOLUTION,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,
REPORT-extracts = predispatchisCsvExtract
[interconnector_soln]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*
REPORT-extracts = interconnectorSolnCsvExtract
transforms.conf
[getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::interconnector_soln
[getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::predispatchis
[interconnectorSolnCsvExtract]
DELIMS = ","
FIELDS = "I","PREDISPATCH,INTERCONNECTOR_SOLN","3","PREDISPATCHSEQNO","RUNNO","INTERCONNECTORID","PERIODID","INTERVENTION","METEREDMWFLOW","MWFLOW","MWLOSSES","MARGINALVALUE","VIOLATIONDEGREE","LASTCHANGED","DATETIME","EXPORTLIMIT","IMPORTLIMIT","MARGINALLOSS","EXPORTGENCONID","IMPORTGENCONID","FCASEXPORTLIMIT","FCASIMPORTLIMIT","LOCAL_PRICE_ADJUSTMENT_EXPORT","LOCALLY_CONSTRAINED_EXPORT","LOCAL_PRICE_ADJUSTMENT_IMPORT","LOCALLY_CONSTRAINED_IMPORT"
[predispatchisCsvExtract]
DELIMS = ","
FIELDS = "I","PREDISPATCH","REGION_SOLUTION","4","PREDISPATCHSEQNO","RUNNO","REGIONID","PERIODID","INTERVENTION","TOTALDEMAND","AVAILABLEGENERATION","AVAILABLELOAD","DEMANDFORECAST","DISPATCHABLEGENERATION","DISPATCHABLELOAD","NETINTERCHANGE","EXCESSGENERATION","LOWER5MINDISPATCH","LOWER5MINIMPORT","LOWER5MINLOCALDISPATCH","LOWER5MINLOCALPRICE","LOWER5MINLOCALREQ","LOWER5MINPRICE","LOWER5MINREQ","LOWER5MINSUPPLYPRICE","LOWER60SECDISPATCH","LOWER60SECIMPORT","LOWER60SECLOCALDISPATCH","LOWER60SECLOCALPRICE","LOWER60SECLOCALREQ","LOWER60SECPRICE","LOWER60SECREQ","LOWER60SECSUPPLYPRICE","LOWER6SECDISPATCH","LOWER6SECIMPORT","LOWER6SECLOCALDISPATCH","LOWER6SECLOCALPRICE","LOWER6SECLOCALREQ","LOWER6SECPRICE","LOWER6SECREQ","LOWER6SECSUPPLYPRICE","RAISE5MINDISPATCH","RAISE5MINIMPORT","RAISE5MINLOCALDISPATCH","RAISE5MINLOCALPRICE","RAISE5MINLOCALREQ","RAISE5MINPRICE","RAISE5MINREQ","RAISE5MINSUPPLYPRICE","RAISE60SECDISPATCH","RAISE60SECIMPORT","RAISE60SECLOCALDISPATCH","RAISE60SECLOCALPRICE","RAISE60SECLOCALREQ","RAISE60SECPRICE","RAISE60SECREQ","RAISE60SECSUPPLYPRICE","RAISE6SECDISPATCH","RAISE6SECIMPORT","RAISE6SECLOCALDISPATCH","RAISE6SECLOCALPRICE","RAISE6SECLOCALREQ","RAISE6SECPRICE","RAISE6SECREQ","RAISE6SECSUPPLYPRICE","LASTCHANGED","DATETIME","INITIALSUPPLY","CLEAREDSUPPLY","LOWERREGIMPORT","LOWERREGLOCALDISPATCH","LOWERREGLOCALREQ","LOWERREGREQ","RAISEREGIMPORT","RAISEREGLOCALDISPATCH","RAISEREGLOCALREQ","RAISEREGREQ","RAISE5MINLOCALVIOLATION","RAISEREGLOCALVIOLATION","RAISE60SECLOCALVIOLATION","RAISE6SECLOCALVIOLATION","LOWER5MINLOCALVIOLATION","LOWERREGLOCALVIOLATION","LOWER60SECLOCALVIOLATION","LOWER6SECLOCALVIOLATION","RAISE5MINVIOLATION","RAISEREGVIOLATION","RAISE60SECVIOLATION","RAISE6SECVIOLATION","LOWER5MINVIOLATION","LOWERREGVIOLATION","LOWER60SECVIOLATION","LOWER6SECVIOLATION","RAISE6SECACTUALAVAILABILITY","RAISE60SECACTUALAVAILABILITY","RAISE5MINACTUALAVAILABILITY","RAISEREGACTUALAVAILABILITY","LOWER6SECACTUALAVAILABILITY","LOWER60SECACTUALAVAILABILITY","LOWER5MINACTUALAVAILABILITY","LOWERREGACTUALAVAILABILITY","DECAVAILABILITY","LORSURPLUS","LRCSURPLUS","TOTALINTERMITTENTGENERATION","DEMAND_AND_NONSCHEDGEN","UIGF","SEMISCHEDULE_CLEAREDMW","SEMISCHEDULE_COMPLIANCEMW"
Splunk should have no problem indexing all the data in the file; however the problem you will face is with sourcetyping and field extraction. There is nothing unique about each event line in this file other than it's corresponding index row. You won't be able to use column counts because clearly you can have multiple data sets with the same number of columns.
I would recommend pre-processing this file to split up each of the data sets so that they can be individually sourcetyped and field extractions can then happen on a per sourcetype basis. This can be done with a little bith of python which unzip's your data sample, chop's it into multiple sets, and writes unique files to Splunk's spool directory.
Hi hazekamp,
This post is quite old but i have exactly the same requirement, would you how now a better solution, or even some python portion code to share ? Thanks anyway 🙂
I am sitting with exactly the same problem.
For now one solution would be to split the data sets manually but that is a pain. I am not sure if you could do multiple extracts based on dataset headings, if so how would we be able to tell the extract where the dataset starts and stops so that you don't get data from other data sets?
any luck so far on your side?
Unfortunately none so far. We will be revisiting it again soon however. I will post up here if we find a solution.