I'm trying to create a props.conf for a .CSV, but I am unsuccessful and believe its because of the field extraction. The format below follows the same data pattern as these headers listed:
Here is my props:
[contract_sunrise] SHOULD_LINEMERGE = false TIME_PREFIX = ^ TIME_FORMAT = %Y-%m-%d_%H:%M:%S MAX_TIMESTAMP_LOOKAHEAD = 40 EXTRACT-contract_sunrise = ^.+\s+(?<TPCode>[^\s]+)\s+(?<"date">[^\s]+)\s+(?<"time">[^\s]+)\s+(?<PurchaseOrderNumber>[^\s]+)\s+(?<"OrderNumber">[^\s]+)\s+(?<CompanyNumber>[^\s]+)\s+(?<Division>[^\s]+)\s+(?<"CustomerNumber">[^\s]+)\s+)\s+(?<BillToSeq>[^\s]+)\s+(?<ShipToID>[^\s]+)
The various "," were automatically added as separators, but if I could forward the data with just the headers and its corresponding data that would be best. Any suggestions on the EXTRACT portion would be greatly appreciated. Thanks!
I would first of all recommend a REPORT-based extraction with DELIMS and FIELDS for this csv.
Assuming that the sample line reflects the actual events, and double (or triple) commas indicate some field you did not intend to extract. So with DELIMS you specify the delimiter between fields, and with FIELDS you specify the field names in the order they appear (all of them).
[contract_sunrise] REPORT-extract_sunrise = sunrise_fields
[sunrise_fields] DELIMS = "," FIELDS = TPCode, field2, date, field4, time, field6, PurchaseOrderNumber, field8, OrderNumber, field10, field11 etc etc.
Thank you for the help. The weird thing about this CSV is that there are commas in the date time sections, so the data is indexing but its all over the place. I think I'll do some event line breaking and try to re-format the time and I'll see how it goes. Thanks again!
Your timestamp specification in props.conf is;
TIME_FORMAT = %Y-%m-%d_%H:%M:%S
However, if you log looks like you say (i.e. as two different fields for
time, that spec is wrong.
Perhaps something more like;
TIME_FORMAT = %Y-%m-%d,,%H:%M:%S
But that all depends on how the events actually look. You should probably update your question with a few sample lines of log...
If you sometimes have data in the (what looks to be) empty field between
time, you might want to create a custom datetime.xml file. Or remove TIME_FORMAT and see if Splunk can fix it anyway.
I never knew what CSV stood for 🙂
Good thinking - I would go with the first 2. Timestamp parsing could most likely be the issue. How could I specify that? It seems like a monitor stanza relative to the one in my question would be best for timestamp parsing..I'm sure there is a way to modify what you've given above?
Well, 'csv' means 'comma separated values', so naturally they would be separators. However, two commas in a row would indicate an empty position. It's hard to tell for sure without seeing your actual events. You might have a screwed up file format. You can get rid of the header row with the techniques used for
nullQueueing or perhaps through a SEDCMD.
However, if you are 'not seeing any data coming in' it could indicate other problems. Faulty timestamp parsing, flawed inputs.conf or index permissions issues spring to mind as possible culprits.