Props.Conf Field Extraction for a .CSV

_gkollias · ‎03-11-2014

I'm trying to create a props.conf for a .CSV, but I am unsuccessful and believe its because of the field extraction. The format below follows the same data pattern as these headers listed:

TPCode,,"date",,"time",,PurchaseOrderNumber,,"OrderNumber",,,CompanyNumber,,Division,,"CustomerNumber",,BillToSeq,,ShipToID

Here is my props:

[contract_sunrise]
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d_%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 40
EXTRACT-contract_sunrise = ^.+\s+(?<TPCode>[^\s]+)\s+(?<"date">[^\s]+)\s+(?<"time">[^\s]+)\s+(?<PurchaseOrderNumber>[^\s]+)\s+(?<"OrderNumber">[^\s]+)\s+(?<CompanyNumber>[^\s]+)\s+(?<Division>[^\s]+)\s+(?<"CustomerNumber">[^\s]+)\s+)\s+(?<BillToSeq>[^\s]+)\s+(?<ShipToID>[^\s]+)

The various "," were automatically added as separators, but if I could forward the data with just the headers and its corresponding data that would be best. Any suggestions on the EXTRACT portion would be greatly appreciated. Thanks!

lukejadamec · ‎03-12-2014

Any errors in the splunkd log?

kristian_kolb · ‎03-11-2014

I would first of all recommend a REPORT-based extraction with DELIMS and FIELDS for this csv.
Assuming that the sample line reflects the actual events, and double (or triple) commas indicate some field you did not intend to extract. So with DELIMS you specify the delimiter between fields, and with FIELDS you specify the field names in the order they appear (all of them).

props.conf

[contract_sunrise]
REPORT-extract_sunrise = sunrise_fields

transforms.conf

[sunrise_fields]
DELIMS = ","
FIELDS = TPCode, field2, date, field4, time, field6, PurchaseOrderNumber, field8, OrderNumber, field10, field11 etc etc.

/K

_gkollias · ‎03-13-2014

Thank you for the help. The weird thing about this CSV is that there are commas in the date time sections, so the data is indexing but its all over the place. I think I'll do some event line breaking and try to re-format the time and I'll see how it goes. Thanks again!

kristian_kolb · ‎03-12-2014

Your timestamp specification in props.conf is;

TIME_FORMAT = %Y-%m-%d_%H:%M:%S

However, if you log looks like you say (i.e. as two different fields for date and time, that spec is wrong.

Perhaps something more like;

TIME_FORMAT = %Y-%m-%d,,%H:%M:%S

But that all depends on how the events actually look. You should probably update your question with a few sample lines of log...

If you sometimes have data in the (what looks to be) empty field between date and time, you might want to create a custom datetime.xml file. Or remove TIME_FORMAT and see if Splunk can fix it anyway.

_gkollias · ‎03-12-2014

I never knew what CSV stood for 🙂

Good thinking - I would go with the first 2. Timestamp parsing could most likely be the issue. How could I specify that? It seems like a monitor stanza relative to the one in my question would be best for timestamp parsing..I'm sure there is a way to modify what you've given above?

kristian_kolb · ‎03-12-2014

Well, 'csv' means 'comma separated values', so naturally they would be separators. However, two commas in a row would indicate an empty position. It's hard to tell for sure without seeing your actual events. You might have a screwed up file format. You can get rid of the header row with the techniques used for nullQueueing or perhaps through a SEDCMD.

However, if you are 'not seeing any data coming in' it could indicate other problems. Faulty timestamp parsing, flawed inputs.conf or index permissions issues spring to mind as possible culprits.

/K

_gkollias · ‎03-12-2014

I'm not seeing any data coming in. One thing I forgot to mention was the format I listed are also the headers. The data follows the same pattern of the headers listed above.

_gkollias · ‎03-12-2014

Hi Kristian, the ",," is actually in the CSV file - I assumed they acted as separators. So field2, field4, field6 - they represent the various commas? I will try this out now - Thanks!

Props.Conf Field Extraction for a .CSV

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Are you a member of the Splunk Community?

Props.Conf Field Extraction for a .CSV

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...