Re: Splunk Add-on for Amazon Web Services: Exporti...

menahem · ‎04-20-2016

Hello all,

I am exporting an S3 bucket with only CSV and when I specified the sourcetype as CSV, I have all my events, but no fields about that event (aka the header is like an event). However, when I am importing my CSV in Splunk Web, the header is correctly inserted.

Any idea about how fix that?

Thanks !

peter_holmes_an · ‎07-18-2016

I'm seeing the same issue, and I've done a bunch of testing on it and it seems to be a problem with the AWS app.

To test I added the AWS app to my dev box (running Splunk as index & search). Setup a test S3 Bucket with a single CSV file in. Used the S3 App to connect to this bucket, with a sourcetype of csv and indexed the data. This produces the following:
Event 1: Header 1,Header 2,Header 3
Timestamp of the time of index
No non default fields
Event 2: Row1Value1,Row1Value2,Row1Value3
Timestamp as the timestamp in the row (the first field of the CSV is a timestamp)
No non default fields
Event 3: Row2Value1,Row2Value2,Row2Value3
Timestamp as the timestamp in the row
No non default fields
...

If I index the same data directly using the Add Data interface and use the csv sourcetype again I get the following:
Event 1: Row1Value1,Row1Value2,Row1Value3
Timestamp as the timestamp in the row
Fields extracted as per the Header row of the file
Event 2: Row2Value1,Row2Value2,Row2Value3
Timestamp as the timestamp in the row
Fields extracted as per the Header row of the file
...

In my testing I've used a bunch of props.conf settings but for this last test my props.conf is as follows (which is just the default setting for csv).

[ csv ]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true

Out of curiosity I also tried a similar setup to https://answers.splunk.com/answers/334056 to use props.conf and transforms.conf to remove the header line of the csv so I could use searchtime field extractions to get the fields, but this is also ignored (I didn't test that too thoroughly though so I may have mistyped something).

phadnett_splunk · ‎05-24-2016

Hello Menahem, I am trying to understand your expectation better. Could you explain further about your meaning behind "no fields about that event (aka the header is like an event)"

It is my understanding that the header of the file is used for field extractions. It would then be processed as a separate event and the fields that were extracted from the header will appear to the left under "Interesting Fields."

Thanks

menahem · ‎05-25-2016

Hello Phadnett,
The problem is very simple, i don't have the different field's (even in "Interesting Fields") that is in the header of the file

marquiselee · ‎02-05-2018

Hi, Were you ever able to resolve this issue? I'm having the same problem.

peter_holmes_an · ‎02-06-2018

Hey, we managed to work around the issue and got a response from Splunk support as to what is occurring.

So the issue is that modular inputs are not honouring the CSV sourcetype set it's a bug but I'm not sure on the status with the fix.

The work around is to pull the file locally and index as any other local CSV.

AnotherSplunker · ‎07-11-2022

Are there any updates on the issue? I recently setup an environment to test ingesting files from an S3 bucket using the SQS-based S3 input and noticed the header of the CSV file was ingested as an event and the remaining actual events didn't have any field extraction applied.

AnotherSplunker · ‎07-26-2022

FYI, Release 6.1.0 of the AWS Add-on that was released on 11th July 2022 resolves this issue:

Release notes for the Splunk Add-on for AWS - Splunk Documentation
New features
Version 6.1.0 of the Splunk Add-on for AWS version contains the following new and changed features:

Support for the parsing of CSV files from AWS S3 (Generic S3 and SQS-based S3 ingestion methods)

https://docs.splunk.com/Documentation/AddOns/released/AWS/Releasenotes#New_features

peter_holmes_an · ‎07-18-2016

I've added some information as a reply to the main question as I'm seeing the same issue but to clarify the problem is that when reading the file from S3 the header is not used for field extraction and is treated as if it were a separate event. So for a file like:

Header1,Header2,Header3
Row1Value1,Row1Value2,Row1Value3
Row2Value1,Row2Value2,Row2Value3

Using AWS S3 you get events like this:
Event 1 Header1,Header2,Header3
Event 2 Row1Value1,Row1Value2,Row1Value3
Event 3 Row2Value1,Row2Value2,Row2Value3
And no searchtime field extraction occurs

but if you add the file locally (just using the Add Data dialogue):
Event 1 Row1Value1,Row1Value2,Row1Value3
Event 2 Row2Value1,Row2Value2,Row2Value3
and the header field is used for field extraction at searchtime.

muebel · ‎04-26-2016

Hi Menaham, I believe this issue will be resolved by creating a props.conf with a [csv] stanza (this input's sourcetype, can be anything you want), and then setting the "INDEXED_EXTRACTIONS = CSV" config at that stanza. More info can be found in the Structured Data section here : http://docs.splunk.com/Documentation/Splunk/latest/Admin/propsconf

Please let me know if this answers your question!

menahem · ‎05-03-2016

It's not working, seems to be a bug with the add-on

menahem · ‎04-22-2016

Anyone has the same problem ?

terrencebenade · ‎05-23-2016

Did you make any progress I am observing the same behaviour.

dolivasoh · ‎04-20-2016

Can you post your props config for this input?

menahem · ‎04-21-2016

my inputs.conf is :

[aws_s3://zx]
aws_account = yx
bucket_name = xy
character_set = auto
ct_blacklist = ^(?:Describe|List|Get)
host_name = s3.amazonaws.com
initial_scan_datetime = 2016-04-10T16:58:20+0200
key_name = devops/
max_items = 100000
max_retries = 3
polling_interval = 60
recursion_depth = -1
sourcetype = csv
ct_excluded_events_index =
index = data
disabled = 0

Splunk Add-on for Amazon Web Services: Exporting an S3 bucket and specifying the sourcetype as CSV, why is the header not parsed correctly?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

Splunk Add-on for Amazon Web Services: Exporting an S3 bucket and specifying the sourcetype as CSV, why is the header not parsed correctly?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers