Getting Data In

Failure to parse more than 2 lines in CSV file

jcbrendsel
Path Finder

I am trying to parse several custom CSV files.

The files are being processed on a universal forwarder which then forwards data off to central indexer.

Here are the first three lines in the csv file. The lines is a note/comment. The second line is the header fields. And the data starts with the 3rd line.

Don't see your tags in the report? New tags are excluded by default - go to https://portal.aws.amazon.com/gp/aws/developer/account?action=cost-allocation-report to update your cost allocation keys.
InvoiceID,PayerAccountId,LinkedAccountId,RecordType,RecordID,BillingPeriodStartDate,BillingPeriodEndDate,InvoiceDate,PayerPONumber,ProductCode,ProductName,SellerOfRecord,UsageType,Operation,AvailabilityZone,RateId,ItemDescription,UsageStartDate,UsageEndDate,UsageQuantity,BlendedRate,CurrencyCode,CostBeforeTax,Credits,TaxAmount,TaxType,TotalCost,user:server_name,user:deployment,user:instance_size,user:Name
"Estimated","462819316490","258616683277","LinkedLineItem","600000000095199124-0","2012/12/01 00:00:00","2012/12/31 23:59:59","2012/12/17 15:57:09","","AmazonEC2","Amazon Elastic Compute Cloud","Amazon Web Services, Inc.","BoxUsage:c1.medium","RunInstances","us-east-1b","307215","$0.06 per Linux/UNIX High-CPU Medium Instance (c1.medium) instance-hour","2012/12/10 16:31:29","2012/12/31 23:59:59","548.16826504","0.141158167","USD","77.378427","0.000000","0.000000","None","77.378427",,,,

Here is the stanza in my props.conf file on the universal forwarder:

[source::/var/log/billing/462819316490-aws-cost-allocation-*]
sourcetype = aws-billing-cost-allocation
CHECK_METHOD=mod_time
SHOULD_LINEMERGE = false
TIME_FORMAT=%Y/%M/%D %H:%M:%S

But only the first two lines are being indexed and returned.

Any ideas why?

Jon

0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Are you sure? If your timestamp extraction is working correctly, then your first two lines will probably have a timestamp from the current time, or the mod time of the file, while the data line will have the timestamp in the line. So when you list them out, if you don't specify the time range to include it, the data line may not get returned.

Also please note that with CHECK_METHOD=modtime, the entire contents of the file is going to get reindexed every time it is modified. Is that what you want?

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

Okay. To use the third timestamp, you're going to have to use TIME_PREFIX with something like:

TIME_PREFIX = ^(?:[^\,]*,){16}

i.e., skip the first 16 field/comma sets.

0 Karma

jcbrendsel
Path Finder

Ah, you are correct. I overlooked the obvious.

In answer to your question, I have mod_time set for debugging purposes so that I can delete the events from the index, touch the files, and have them reindexed.

But I do have a related follow up question:

There are three timestamp fields in that csv file. I need to use the third one. Do you know how to force that?

jon

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Are you sure? If your timestamp extraction is working correctly, then your first two lines will probably have a timestamp from the current time, or the mod time of the file, while the data line will have the timestamp in the line. So when you list them out, if you don't specify the time range to include it, the data line may not get returned.

Also please note that with CHECK_METHOD=modtime, the entire contents of the file is going to get reindexed every time it is modified. Is that what you want?

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Observability Simplified: Combining User Experience, Application Performance & ...

Tech Talk Observability Simplified: Combining User Experience, Application Performance & Network ...

Event Series May & June: From Network Visibility to Service Intelligence

Unifying the Network: Moving from Alert Noise to Service Intelligence with Splunk ITSI In today’s hybrid ...