Getting Data In

Indexing a CSV data file with more than one set of data

Hi All,

Just curious about the best method to index a CSV file with multiple sets of data inside?

The basic format of the whole file is

I,DataSet1_FieldName1,DataSet1_FieldName2,DataSet1_FieldName3
D,this,54,fred
D,this,87,barry
I,DataSet2_FieldName1,DataSet2_FieldName2,DataSet2_FieldName3
D,784,moreInfo,thatData
D,5443,moreInfo2,thisData
D,524,moreInfo2,theOtherData
I,DataSet3_FieldName1,DataSet3_FieldName2
D,Wow,SoMuchData
D,Really,MoreData

and on and on it goes with about 5 sets of data.

As you can see the first field of each row determines if the row is a header/index row or a data row

I = Index/Header row
D = Data row

We have successfully been able to retrieve only ONE set of data by using the following


inputs.conf


[monitor:///tmp/csvProvider]<br />
disabled = false<br />
followTail = 0<br />
host = csvProvider<br />
sourcetype = public_data



props.conf


[public_data]<br />
KV_MODE = none<br />
SHOULD_LINEMERGE = false<br />
TRANSFORMS-filterprices = setnull,getprices<br />
REPORT-extracts = csv_extract<br />



transforms.conf


[setnull]<br />
REGEX = .<br />
DEST_KEY = queue<br />
FORMAT = nullQueue<br />
<br />
[getprices]<br />
REGEX = ^D,DISPATCH,PRICE,(.*)<br />
DEST_KEY = queue<br />
FORMAT = indexQueue<br />
<br />
[csv_extract]<br />
DELIMS = ","<br />
FIELDS = "I","DISPATCH","PRICE","THREE","SETTLEMENTDATE","RUNNO","REGIONID","DISPATCHINTERVAL","INTERVENTION","RRP"<br />


Does anyone know the best method to get all sets of data out?

Secondly the files we retrieve are actually .zip files. Currently we are extracting them before handing them off to splunk. Is there a way to get splunk to extract and process these files in the same manner as above? We previously tried to get splunk to process the zip files but it didn't seem to handle them very well at all.

Thanks

Tags (2)
0 Karma
1 Solution

I cracked it... not ideal but it works.

I am sure there is probably a better way to "repeat" that regular expression pattern but it works!!!!!

As you can see the first entry of FORMAT for each REPORT sets the "type" of data to allow you to search for a specific set.

INTERCONNECTOR_SOLN::1

allows for

sourcetype=predispatchis INTERCONNECTOR_SOLN=1

Now while I would much prefer to have different sourcetypes for each of these this is the next best solution.

props.conf

[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX=\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}
TRANSFORMS-a-filterprices = setnull,PREDISPATCH-getInterconnector,PREDISPATCH-getRegionSolution,PREDISPATCH-getRegionPrices
REPORT-extract_regionSolution = PREDISPATCH-regionSolutionCsvExtract
REPORT-extract_interconnect = PREDISPATCH-interconnectorSolnCsvExtract
REPORT-extract_regionPrices = PREDISPATCH-regionPricesCsvExtract

transforms.conf

[PREDISPATCH-getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[PREDISPATCH-getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[PREDISPATCH-getRegionPrices]
REGEX = ^D,PREDISPATCH,REGION_PRICES,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[PREDISPATCH-interconnectorSolnCsvExtract]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = INTERCONNECTOR_SOLN::1 PREDISPATCHSEQNO::$2 RUNNO::$3 INTERCONNECTORID::$4 PERIODID::$5 INTERVENTION::$6 METEREDMWFLOW::$7 MWFLOW::$8 MWLOSSES::$9 MARGINALVALUE::$10 VIOLATIONDEGREE::$11 LASTCHANGED::$12 DATETIME::$13 EXPORTLIMIT::$14 IMPORTLIMIT::$15 MARGINALLOSS::$16 EXPORTGENCONID::$17 IMPORTGENCONID::$18 FCASEXPORTLIMIT::$19 FCASIMPORTLIMIT::$20 LOCAL_PRICE_ADJUSTMENT_EXPORT::$21 LOCALLY_CONSTRAINED_EXPORT::$22 LOCAL_PRICE_ADJUSTMENT_IMPORT::$23 LOCALLY_CONSTRAINED_IMPORT::$24


[PREDISPATCH-regionSolutionCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_SOLUTION::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 TOTALDEMAND::$7 AVAILABLEGENERATION::$8 AVAILABLELOAD::$9 DEMANDFORECAST::$10 DISPATCHABLEGENERATION::$11 DISPATCHABLELOAD::$12 NETINTERCHANGE::$13 EXCESSGENERATION::$14 LOWER5MINDISPATCH::$15 LOWER5MINIMPORT::$16 LOWER5MINLOCALDISPATCH::$17 LOWER5MINLOCALPRICE::$18 LOWER5MINLOCALREQ::$19 LOWER5MINPRICE::$20 LOWER5MINREQ::$21 LOWER5MINSUPPLYPRICE::$22 LOWER60SECDISPATCH::$23 LOWER60SECIMPORT::$24 LOWER60SECLOCALDISPATCH::$25 LOWER60SECLOCALPRICE::$26 LOWER60SECLOCALREQ::$27 LOWER60SECPRICE::$28 LOWER60SECREQ::$29 LOWER60SECSUPPLYPRICE::$30 LOWER6SECDISPATCH::$31 LOWER6SECIMPORT::$32 LOWER6SECLOCALDISPATCH::$33 LOWER6SECLOCALPRICE::$34 LOWER6SECLOCALREQ::$35 LOWER6SECPRICE::$36 LOWER6SECREQ::$37 LOWER6SECSUPPLYPRICE::$38 RAISE5MINDISPATCH::$39 RAISE5MINIMPORT::$40 RAISE5MINLOCALDISPATCH::$41 RAISE5MINLOCALPRICE::$42 RAISE5MINLOCALREQ::$43 RAISE5MINPRICE::$44 RAISE5MINREQ::$45 RAISE5MINSUPPLYPRICE::$46 RAISE60SECDISPATCH::$47 RAISE60SECIMPORT::$48 RAISE60SECLOCALDISPATCH::$49 RAISE60SECLOCALPRICE::$50 RAISE60SECLOCALREQ::$51 RAISE60SECPRICE::$52 RAISE60SECREQ::$53 RAISE60SECSUPPLYPRICE::$54 RAISE6SECDISPATCH::$55 RAISE6SECIMPORT::$56 RAISE6SECLOCALDISPATCH::$57 RAISE6SECLOCALPRICE::$58 RAISE6SECLOCALREQ::$59 RAISE6SECPRICE::$60 RAISE6SECREQ::$61 RAISE6SECSUPPLYPRICE::$62 LASTCHANGED::$63 DATETIME::$64 INITIALSUPPLY::$65 CLEAREDSUPPLY::$66 LOWERREGIMPORT::$67 LOWERREGLOCALDISPATCH::$68 LOWERREGLOCALREQ::$69 LOWERREGREQ::$70 RAISEREGIMPORT::$71 RAISEREGLOCALDISPATCH::$72 RAISEREGLOCALREQ::$73 RAISEREGREQ::$74 RAISE5MINLOCALVIOLATION::$75 RAISEREGLOCALVIOLATION::$76 RAISE60SECLOCALVIOLATION::$77 RAISE6SECLOCALVIOLATION::$78 LOWER5MINLOCALVIOLATION::$79 LOWERREGLOCALVIOLATION::$80 LOWER60SECLOCALVIOLATION::$81 LOWER6SECLOCALVIOLATION::$82 RAISE5MINVIOLATION::$83 RAISEREGVIOLATION::$84 RAISE60SECVIOLATION::$85 RAISE6SECVIOLATION::$86 LOWER5MINVIOLATION::$87 LOWERREGVIOLATION::$88 LOWER60SECVIOLATION::$89 LOWER6SECVIOLATION::$90 RAISE6SECACTUALAVAILABILITY::$91 RAISE60SECACTUALAVAILABILITY::$92 RAISE5MINACTUALAVAILABILITY::$93 RAISEREGACTUALAVAILABILITY::$94 LOWER6SECACTUALAVAILABILITY::$95 LOWER60SECACTUALAVAILABILITY::$96 LOWER5MINACTUALAVAILABILITY::$97 LOWERREGACTUALAVAILABILITY::$98 DECAVAILABILITY::$99 LORSURPLUS::$100 LRCSURPLUS::$101 TOTALINTERMITTENTGENERATION::$102 DEMAND_AND_NONSCHEDGEN::$103 UIGF::$104 SEMISCHEDULE_CLEAREDMW::$105 SEMISCHEDULE_COMPLIANCEMW::$106

[PREDISPATCH-regionPricesCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_PRICES,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_PRICES::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 RRP::$7 EEP::$8 RRP1::$9 EEP1::$10 RRP2::$11 EEP2::$12 RRP3::$13 EEP3::$14 RRP4::$15 EEP4::$16 RRP5::$17 EEP5::$18 RRP6::$19 EEP6::$20 RRP7::$21 EEP7::$22 RRP8::$23 EEP8::$24 LASTCHANGED::$25 DATETIME::$26 RAISE6SECRRP::$27 RAISE60SECRRP::$28 RAISE5MINRRP::$29 RAISEREGRRP::$30 LOWER6SECRRP::$31 LOWER60SECRRP::$32 LOWER5MINRRP::$33 LOWERREGRRP::$34

View solution in original post

0 Karma

I cracked it... not ideal but it works.

I am sure there is probably a better way to "repeat" that regular expression pattern but it works!!!!!

As you can see the first entry of FORMAT for each REPORT sets the "type" of data to allow you to search for a specific set.

INTERCONNECTOR_SOLN::1

allows for

sourcetype=predispatchis INTERCONNECTOR_SOLN=1

Now while I would much prefer to have different sourcetypes for each of these this is the next best solution.

props.conf

[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX=\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}
TRANSFORMS-a-filterprices = setnull,PREDISPATCH-getInterconnector,PREDISPATCH-getRegionSolution,PREDISPATCH-getRegionPrices
REPORT-extract_regionSolution = PREDISPATCH-regionSolutionCsvExtract
REPORT-extract_interconnect = PREDISPATCH-interconnectorSolnCsvExtract
REPORT-extract_regionPrices = PREDISPATCH-regionPricesCsvExtract

transforms.conf

[PREDISPATCH-getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[PREDISPATCH-getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[PREDISPATCH-getRegionPrices]
REGEX = ^D,PREDISPATCH,REGION_PRICES,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[PREDISPATCH-interconnectorSolnCsvExtract]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = INTERCONNECTOR_SOLN::1 PREDISPATCHSEQNO::$2 RUNNO::$3 INTERCONNECTORID::$4 PERIODID::$5 INTERVENTION::$6 METEREDMWFLOW::$7 MWFLOW::$8 MWLOSSES::$9 MARGINALVALUE::$10 VIOLATIONDEGREE::$11 LASTCHANGED::$12 DATETIME::$13 EXPORTLIMIT::$14 IMPORTLIMIT::$15 MARGINALLOSS::$16 EXPORTGENCONID::$17 IMPORTGENCONID::$18 FCASEXPORTLIMIT::$19 FCASIMPORTLIMIT::$20 LOCAL_PRICE_ADJUSTMENT_EXPORT::$21 LOCALLY_CONSTRAINED_EXPORT::$22 LOCAL_PRICE_ADJUSTMENT_IMPORT::$23 LOCALLY_CONSTRAINED_IMPORT::$24


[PREDISPATCH-regionSolutionCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_SOLUTION::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 TOTALDEMAND::$7 AVAILABLEGENERATION::$8 AVAILABLELOAD::$9 DEMANDFORECAST::$10 DISPATCHABLEGENERATION::$11 DISPATCHABLELOAD::$12 NETINTERCHANGE::$13 EXCESSGENERATION::$14 LOWER5MINDISPATCH::$15 LOWER5MINIMPORT::$16 LOWER5MINLOCALDISPATCH::$17 LOWER5MINLOCALPRICE::$18 LOWER5MINLOCALREQ::$19 LOWER5MINPRICE::$20 LOWER5MINREQ::$21 LOWER5MINSUPPLYPRICE::$22 LOWER60SECDISPATCH::$23 LOWER60SECIMPORT::$24 LOWER60SECLOCALDISPATCH::$25 LOWER60SECLOCALPRICE::$26 LOWER60SECLOCALREQ::$27 LOWER60SECPRICE::$28 LOWER60SECREQ::$29 LOWER60SECSUPPLYPRICE::$30 LOWER6SECDISPATCH::$31 LOWER6SECIMPORT::$32 LOWER6SECLOCALDISPATCH::$33 LOWER6SECLOCALPRICE::$34 LOWER6SECLOCALREQ::$35 LOWER6SECPRICE::$36 LOWER6SECREQ::$37 LOWER6SECSUPPLYPRICE::$38 RAISE5MINDISPATCH::$39 RAISE5MINIMPORT::$40 RAISE5MINLOCALDISPATCH::$41 RAISE5MINLOCALPRICE::$42 RAISE5MINLOCALREQ::$43 RAISE5MINPRICE::$44 RAISE5MINREQ::$45 RAISE5MINSUPPLYPRICE::$46 RAISE60SECDISPATCH::$47 RAISE60SECIMPORT::$48 RAISE60SECLOCALDISPATCH::$49 RAISE60SECLOCALPRICE::$50 RAISE60SECLOCALREQ::$51 RAISE60SECPRICE::$52 RAISE60SECREQ::$53 RAISE60SECSUPPLYPRICE::$54 RAISE6SECDISPATCH::$55 RAISE6SECIMPORT::$56 RAISE6SECLOCALDISPATCH::$57 RAISE6SECLOCALPRICE::$58 RAISE6SECLOCALREQ::$59 RAISE6SECPRICE::$60 RAISE6SECREQ::$61 RAISE6SECSUPPLYPRICE::$62 LASTCHANGED::$63 DATETIME::$64 INITIALSUPPLY::$65 CLEAREDSUPPLY::$66 LOWERREGIMPORT::$67 LOWERREGLOCALDISPATCH::$68 LOWERREGLOCALREQ::$69 LOWERREGREQ::$70 RAISEREGIMPORT::$71 RAISEREGLOCALDISPATCH::$72 RAISEREGLOCALREQ::$73 RAISEREGREQ::$74 RAISE5MINLOCALVIOLATION::$75 RAISEREGLOCALVIOLATION::$76 RAISE60SECLOCALVIOLATION::$77 RAISE6SECLOCALVIOLATION::$78 LOWER5MINLOCALVIOLATION::$79 LOWERREGLOCALVIOLATION::$80 LOWER60SECLOCALVIOLATION::$81 LOWER6SECLOCALVIOLATION::$82 RAISE5MINVIOLATION::$83 RAISEREGVIOLATION::$84 RAISE60SECVIOLATION::$85 RAISE6SECVIOLATION::$86 LOWER5MINVIOLATION::$87 LOWERREGVIOLATION::$88 LOWER60SECVIOLATION::$89 LOWER6SECVIOLATION::$90 RAISE6SECACTUALAVAILABILITY::$91 RAISE60SECACTUALAVAILABILITY::$92 RAISE5MINACTUALAVAILABILITY::$93 RAISEREGACTUALAVAILABILITY::$94 LOWER6SECACTUALAVAILABILITY::$95 LOWER60SECACTUALAVAILABILITY::$96 LOWER5MINACTUALAVAILABILITY::$97 LOWERREGACTUALAVAILABILITY::$98 DECAVAILABILITY::$99 LORSURPLUS::$100 LRCSURPLUS::$101 TOTALINTERMITTENTGENERATION::$102 DEMAND_AND_NONSCHEDGEN::$103 UIGF::$104 SEMISCHEDULE_CLEAREDMW::$105 SEMISCHEDULE_COMPLIANCEMW::$106

[PREDISPATCH-regionPricesCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_PRICES,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = REGION_PRICES::1 PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 RRP::$7 EEP::$8 RRP1::$9 EEP1::$10 RRP2::$11 EEP2::$12 RRP3::$13 EEP3::$14 RRP4::$15 EEP4::$16 RRP5::$17 EEP5::$18 RRP6::$19 EEP6::$20 RRP7::$21 EEP7::$22 RRP8::$23 EEP8::$24 LASTCHANGED::$25 DATETIME::$26 RAISE6SECRRP::$27 RAISE60SECRRP::$28 RAISE5MINRRP::$29 RAISEREGRRP::$30 LOWER6SECRRP::$31 LOWER60SECRRP::$32 LOWER5MINRRP::$33 LOWERREGRRP::$34

View solution in original post

0 Karma

I've even tried this to just get them into the same sourcetype but with different extractions. In the hope that I can isolate them with a search on a field unique to each different data set.

This works and only pulls in the rows I am interested in however the fields dont extract. 😞

So close.....

inputs.conf

[monitor:///opt/web-data/PreDispatchIS_Reports]
disabled = false
followTail = 0
host = aemo-web
sourcetype = predispatchis
whitelist = PUBLIC_PREDISPATCHIS_[0-9_]*.CSV
crcSalt = <SOURCE>

props.conf

[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX=\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}
TRANSFORMS-a-filterprices = setnull,getInterconnector,getRegionSolution
#TRANSFORMS-z-sourcetype-rewrite=setInterconnectorSource,setRegionSolutionSource
REPORT-extract_region = predispatchisCsvExtract
REPORT-extract_interconnect = interconnectorSolnCsvExtract

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[interconnectorSolnCsvExtract]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = PREDISPATCHSEQNO::$2 RUNNO::$3 INTERCONNECTORID::$4 PERIODID::$5 INTERVENTION::$6 METEREDMWFLOW::$7

[predispatchisCsvExtract]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)
FORMAT = PREDISPATCHSEQNO::$2 RUNNO::$3 REGIONID::$4 PERIODID::$5 INTERVENTION::$6 TOTALDEMAND::$7
0 Karma

OK I have encountered this issue again and this time I need to resolve it.

The data is publicly available so I suppose there is no need to keep it private here is an example contents of the CSV file.

http://pastebin.com/pFSQNdru

or grab a full file to unzip here http://www.nemweb.com.au/REPORTS/CURRENT/PreDispatchIS_Reports/

The first sourcetype I want to call "predispatchis" and matches these formats

D,PREDISPATCH,REGION_SOLUTION,........

I want _time to be the second date in each row

The second sourcetype I want to call "interconnector_soln" and matches these formats

D,PREDISPATCH,INTERCONNECTOR_SOLN,

I want _time to be the second date in each row

So my thinking was have the inputs.conf pulling into a "temp" sourcetype then reassign the sourcetype based on which row matches from the transforms. Then this new sourcetype will (hopefully) find the timestamp and finally a few search time field extractions to get out the fields based on the known format.

Sadly it aint working at the moment and nothing is coming in.

Any assistance would be greatly appreciated.

inputs.conf

[monitor:///opt/web-data/PreDispatchIS_Reports]
disabled = false
followTail = 0
host = aemo-web
sourcetype = predispatchis_init_sourcetype
index=test
whitelist = PUBLIC_PREDISPATCHIS_[0-9_]*.CSV
crcSalt = <SOURCE>

props.conf

[predispatchis_init_sourcetype]
KV_MODE = none
SHOULD_LINEMERGE = false
TRANSFORMS-filterprices = setnull,getInterconnector,getRegionSolution

[predispatchis]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX = ^D,PREDISPATCH,REGION_SOLUTION,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,
REPORT-extracts = predispatchisCsvExtract

[interconnector_soln]
KV_MODE = none
SHOULD_LINEMERGE = false
TIME_PREFIX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*
REPORT-extracts = interconnectorSolnCsvExtract

transforms.conf

[getInterconnector]
REGEX = ^D,PREDISPATCH,INTERCONNECTOR_SOLN,(.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::interconnector_soln

[getRegionSolution]
REGEX = ^D,PREDISPATCH,REGION_SOLUTION,(.*)
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::predispatchis

[interconnectorSolnCsvExtract]
DELIMS = ","
FIELDS =       "I","PREDISPATCH,INTERCONNECTOR_SOLN","3","PREDISPATCHSEQNO","RUNNO","INTERCONNECTORID","PERIODID","INTERVENTION","METEREDMWFLOW","MWFLOW","MWLOSSES","MARGINALVALUE","VIOLATIONDEGREE","LASTCHANGED","DATETIME","EXPORTLIMIT","IMPORTLIMIT","MARGINALLOSS","EXPORTGENCONID","IMPORTGENCONID","FCASEXPORTLIMIT","FCASIMPORTLIMIT","LOCAL_PRICE_ADJUSTMENT_EXPORT","LOCALLY_CONSTRAINED_EXPORT","LOCAL_PRICE_ADJUSTMENT_IMPORT","LOCALLY_CONSTRAINED_IMPORT"

[predispatchisCsvExtract]
DELIMS = ","
FIELDS = "I","PREDISPATCH","REGION_SOLUTION","4","PREDISPATCHSEQNO","RUNNO","REGIONID","PERIODID","INTERVENTION","TOTALDEMAND","AVAILABLEGENERATION","AVAILABLELOAD","DEMANDFORECAST","DISPATCHABLEGENERATION","DISPATCHABLELOAD","NETINTERCHANGE","EXCESSGENERATION","LOWER5MINDISPATCH","LOWER5MINIMPORT","LOWER5MINLOCALDISPATCH","LOWER5MINLOCALPRICE","LOWER5MINLOCALREQ","LOWER5MINPRICE","LOWER5MINREQ","LOWER5MINSUPPLYPRICE","LOWER60SECDISPATCH","LOWER60SECIMPORT","LOWER60SECLOCALDISPATCH","LOWER60SECLOCALPRICE","LOWER60SECLOCALREQ","LOWER60SECPRICE","LOWER60SECREQ","LOWER60SECSUPPLYPRICE","LOWER6SECDISPATCH","LOWER6SECIMPORT","LOWER6SECLOCALDISPATCH","LOWER6SECLOCALPRICE","LOWER6SECLOCALREQ","LOWER6SECPRICE","LOWER6SECREQ","LOWER6SECSUPPLYPRICE","RAISE5MINDISPATCH","RAISE5MINIMPORT","RAISE5MINLOCALDISPATCH","RAISE5MINLOCALPRICE","RAISE5MINLOCALREQ","RAISE5MINPRICE","RAISE5MINREQ","RAISE5MINSUPPLYPRICE","RAISE60SECDISPATCH","RAISE60SECIMPORT","RAISE60SECLOCALDISPATCH","RAISE60SECLOCALPRICE","RAISE60SECLOCALREQ","RAISE60SECPRICE","RAISE60SECREQ","RAISE60SECSUPPLYPRICE","RAISE6SECDISPATCH","RAISE6SECIMPORT","RAISE6SECLOCALDISPATCH","RAISE6SECLOCALPRICE","RAISE6SECLOCALREQ","RAISE6SECPRICE","RAISE6SECREQ","RAISE6SECSUPPLYPRICE","LASTCHANGED","DATETIME","INITIALSUPPLY","CLEAREDSUPPLY","LOWERREGIMPORT","LOWERREGLOCALDISPATCH","LOWERREGLOCALREQ","LOWERREGREQ","RAISEREGIMPORT","RAISEREGLOCALDISPATCH","RAISEREGLOCALREQ","RAISEREGREQ","RAISE5MINLOCALVIOLATION","RAISEREGLOCALVIOLATION","RAISE60SECLOCALVIOLATION","RAISE6SECLOCALVIOLATION","LOWER5MINLOCALVIOLATION","LOWERREGLOCALVIOLATION","LOWER60SECLOCALVIOLATION","LOWER6SECLOCALVIOLATION","RAISE5MINVIOLATION","RAISEREGVIOLATION","RAISE60SECVIOLATION","RAISE6SECVIOLATION","LOWER5MINVIOLATION","LOWERREGVIOLATION","LOWER60SECVIOLATION","LOWER6SECVIOLATION","RAISE6SECACTUALAVAILABILITY","RAISE60SECACTUALAVAILABILITY","RAISE5MINACTUALAVAILABILITY","RAISEREGACTUALAVAILABILITY","LOWER6SECACTUALAVAILABILITY","LOWER60SECACTUALAVAILABILITY","LOWER5MINACTUALAVAILABILITY","LOWERREGACTUALAVAILABILITY","DECAVAILABILITY","LORSURPLUS","LRCSURPLUS","TOTALINTERMITTENTGENERATION","DEMAND_AND_NONSCHEDGEN","UIGF","SEMISCHEDULE_CLEAREDMW","SEMISCHEDULE_COMPLIANCEMW"
0 Karma

Builder

Splunk should have no problem indexing all the data in the file; however the problem you will face is with sourcetyping and field extraction. There is nothing unique about each event line in this file other than it's corresponding index row. You won't be able to use column counts because clearly you can have multiple data sets with the same number of columns.

I would recommend pre-processing this file to split up each of the data sets so that they can be individually sourcetyped and field extractions can then happen on a per sourcetype basis. This can be done with a little bith of python which unzip's your data sample, chop's it into multiple sets, and writes unique files to Splunk's spool directory.

0 Karma

SplunkTrust
SplunkTrust

Hi hazekamp,

This post is quite old but i have exactly the same requirement, would you how now a better solution, or even some python portion code to share ? Thanks anyway 🙂

0 Karma

Explorer

I am sitting with exactly the same problem.

For now one solution would be to split the data sets manually but that is a pain. I am not sure if you could do multiple extracts based on dataset headings, if so how would we be able to tell the extract where the dataset starts and stops so that you don't get data from other data sets?

any luck so far on your side?

0 Karma

Unfortunately none so far. We will be revisiting it again soon however. I will post up here if we find a solution.

0 Karma