All Apps and Add-ons

Splunk Add-on for Amazon Web Services: Exporting an S3 bucket and specifying the sourcetype as CSV, why is the header not parsed correctly?

menahem
Explorer

Hello all,

I am exporting an S3 bucket with only CSV and when I specified the sourcetype as CSV, I have all my events, but no fields about that event (aka the header is like an event). However, when I am importing my CSV in Splunk Web, the header is correctly inserted.

Any idea about how fix that?

Thanks !

peter_holmes_an
Path Finder

I'm seeing the same issue, and I've done a bunch of testing on it and it seems to be a problem with the AWS app.

To test I added the AWS app to my dev box (running Splunk as index & search). Setup a test S3 Bucket with a single CSV file in. Used the S3 App to connect to this bucket, with a sourcetype of csv and indexed the data. This produces the following:
Event 1: Header 1,Header 2,Header 3
Timestamp of the time of index
No non default fields
Event 2: Row1Value1,Row1Value2,Row1Value3
Timestamp as the timestamp in the row (the first field of the CSV is a timestamp)
No non default fields
Event 3: Row2Value1,Row2Value2,Row2Value3
Timestamp as the timestamp in the row
No non default fields
...

If I index the same data directly using the Add Data interface and use the csv sourcetype again I get the following:
Event 1: Row1Value1,Row1Value2,Row1Value3
Timestamp as the timestamp in the row
Fields extracted as per the Header row of the file
Event 2: Row2Value1,Row2Value2,Row2Value3
Timestamp as the timestamp in the row
Fields extracted as per the Header row of the file
...

In my testing I've used a bunch of props.conf settings but for this last test my props.conf is as follows (which is just the default setting for csv).

[ csv ]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true

Out of curiosity I also tried a similar setup to https://answers.splunk.com/answers/334056 to use props.conf and transforms.conf to remove the header line of the csv so I could use searchtime field extractions to get the fields, but this is also ignored (I didn't test that too thoroughly though so I may have mistyped something).

0 Karma

phadnett_splunk
Splunk Employee
Splunk Employee

Hello Menahem, I am trying to understand your expectation better. Could you explain further about your meaning behind "no fields about that event (aka the header is like an event)"

It is my understanding that the header of the file is used for field extractions. It would then be processed as a separate event and the fields that were extracted from the header will appear to the left under "Interesting Fields."

Thanks

0 Karma

menahem
Explorer

Hello Phadnett,
The problem is very simple, i don't have the different field's (even in "Interesting Fields") that is in the header of the file

0 Karma

marquiselee
Path Finder

Hi, Were you ever able to resolve this issue? I'm having the same problem.

0 Karma

peter_holmes_an
Path Finder

Hey, we managed to work around the issue and got a response from Splunk support as to what is occurring.

So the issue is that modular inputs are not honouring the CSV sourcetype set it's a bug but I'm not sure on the status with the fix.

The work around is to pull the file locally and index as any other local CSV.

0 Karma

AnotherSplunker
Observer

Are there any updates on the issue? I recently setup an environment to test ingesting files from an S3 bucket using the SQS-based S3 input and noticed the header of the CSV file was ingested as an event and the remaining actual events didn't have any field extraction applied.

0 Karma

AnotherSplunker
Observer

FYI, Release 6.1.0 of the AWS Add-on that was released on 11th July 2022 resolves this issue:

Release notes for the Splunk Add-on for AWS - Splunk Documentation
New features
Version 6.1.0 of the Splunk Add-on for AWS version contains the following new and changed features:

  • Support for the parsing of CSV files from AWS S3 (Generic S3 and SQS-based S3 ingestion methods)

https://docs.splunk.com/Documentation/AddOns/released/AWS/Releasenotes#New_features

0 Karma

peter_holmes_an
Path Finder

I've added some information as a reply to the main question as I'm seeing the same issue but to clarify the problem is that when reading the file from S3 the header is not used for field extraction and is treated as if it were a separate event. So for a file like:

Header1,Header2,Header3
Row1Value1,Row1Value2,Row1Value3
Row2Value1,Row2Value2,Row2Value3

Using AWS S3 you get events like this:
Event 1 Header1,Header2,Header3
Event 2 Row1Value1,Row1Value2,Row1Value3
Event 3 Row2Value1,Row2Value2,Row2Value3
And no searchtime field extraction occurs

but if you add the file locally (just using the Add Data dialogue):
Event 1 Row1Value1,Row1Value2,Row1Value3
Event 2 Row2Value1,Row2Value2,Row2Value3
and the header field is used for field extraction at searchtime.

0 Karma

muebel
SplunkTrust
SplunkTrust

Hi Menaham, I believe this issue will be resolved by creating a props.conf with a [csv] stanza (this input's sourcetype, can be anything you want), and then setting the "INDEXED_EXTRACTIONS = CSV" config at that stanza. More info can be found in the Structured Data section here : http://docs.splunk.com/Documentation/Splunk/latest/Admin/propsconf

Please let me know if this answers your question!

0 Karma

menahem
Explorer

It's not working, seems to be a bug with the add-on

0 Karma

menahem
Explorer

Anyone has the same problem ?

terrencebenade
Explorer

Did you make any progress I am observing the same behaviour.

0 Karma

dolivasoh
Contributor

Can you post your props config for this input?

0 Karma

menahem
Explorer

my inputs.conf is :

[aws_s3://zx]
aws_account = yx
bucket_name = xy
character_set = auto
ct_blacklist = ^(?:Describe|List|Get)
host_name = s3.amazonaws.com
initial_scan_datetime = 2016-04-10T16:58:20+0200
key_name = devops/
max_items = 100000
max_retries = 3
polling_interval = 60
recursion_depth = -1
sourcetype = csv
ct_excluded_events_index =
index = data
disabled = 0
0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...