Hello all,
I am exporting an S3 bucket with only CSV and when I specified the sourcetype as CSV, I have all my events, but no fields about that event (aka the header is like an event). However, when I am importing my CSV in Splunk Web, the header is correctly inserted.
Any idea about how fix that?
Thanks !
I'm seeing the same issue, and I've done a bunch of testing on it and it seems to be a problem with the AWS app.
To test I added the AWS app to my dev box (running Splunk as index & search). Setup a test S3 Bucket with a single CSV file in. Used the S3 App to connect to this bucket, with a sourcetype of csv and indexed the data. This produces the following:
Event 1: Header 1,Header 2,Header 3
Timestamp of the time of index
No non default fields
Event 2: Row1Value1,Row1Value2,Row1Value3
Timestamp as the timestamp in the row (the first field of the CSV is a timestamp)
No non default fields
Event 3: Row2Value1,Row2Value2,Row2Value3
Timestamp as the timestamp in the row
No non default fields
...
If I index the same data directly using the Add Data interface and use the csv sourcetype again I get the following:
Event 1: Row1Value1,Row1Value2,Row1Value3
Timestamp as the timestamp in the row
Fields extracted as per the Header row of the file
Event 2: Row2Value1,Row2Value2,Row2Value3
Timestamp as the timestamp in the row
Fields extracted as per the Header row of the file
...
In my testing I've used a bunch of props.conf settings but for this last test my props.conf is as follows (which is just the default setting for csv).
[ csv ]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
Out of curiosity I also tried a similar setup to https://answers.splunk.com/answers/334056 to use props.conf and transforms.conf to remove the header line of the csv so I could use searchtime field extractions to get the fields, but this is also ignored (I didn't test that too thoroughly though so I may have mistyped something).
Hello Menahem, I am trying to understand your expectation better. Could you explain further about your meaning behind "no fields about that event (aka the header is like an event)"
It is my understanding that the header of the file is used for field extractions. It would then be processed as a separate event and the fields that were extracted from the header will appear to the left under "Interesting Fields."
Thanks
Hello Phadnett,
The problem is very simple, i don't have the different field's (even in "Interesting Fields") that is in the header of the file
Hi, Were you ever able to resolve this issue? I'm having the same problem.
Hey, we managed to work around the issue and got a response from Splunk support as to what is occurring.
So the issue is that modular inputs are not honouring the CSV sourcetype set it's a bug but I'm not sure on the status with the fix.
The work around is to pull the file locally and index as any other local CSV.
Are there any updates on the issue? I recently setup an environment to test ingesting files from an S3 bucket using the SQS-based S3 input and noticed the header of the CSV file was ingested as an event and the remaining actual events didn't have any field extraction applied.
FYI, Release 6.1.0 of the AWS Add-on that was released on 11th July 2022 resolves this issue:
Release notes for the Splunk Add-on for AWS - Splunk Documentation
New features
Version 6.1.0 of the Splunk Add-on for AWS version contains the following new and changed features:
https://docs.splunk.com/Documentation/AddOns/released/AWS/Releasenotes#New_features
I've added some information as a reply to the main question as I'm seeing the same issue but to clarify the problem is that when reading the file from S3 the header is not used for field extraction and is treated as if it were a separate event. So for a file like:
Header1,Header2,Header3
Row1Value1,Row1Value2,Row1Value3
Row2Value1,Row2Value2,Row2Value3
Using AWS S3 you get events like this:
Event 1 Header1,Header2,Header3
Event 2 Row1Value1,Row1Value2,Row1Value3
Event 3 Row2Value1,Row2Value2,Row2Value3
And no searchtime field extraction occurs
but if you add the file locally (just using the Add Data dialogue):
Event 1 Row1Value1,Row1Value2,Row1Value3
Event 2 Row2Value1,Row2Value2,Row2Value3
and the header field is used for field extraction at searchtime.
Hi Menaham, I believe this issue will be resolved by creating a props.conf with a [csv] stanza (this input's sourcetype, can be anything you want), and then setting the "INDEXED_EXTRACTIONS = CSV" config at that stanza. More info can be found in the Structured Data section here : http://docs.splunk.com/Documentation/Splunk/latest/Admin/propsconf
Please let me know if this answers your question!
It's not working, seems to be a bug with the add-on
Anyone has the same problem ?
Did you make any progress I am observing the same behaviour.
Can you post your props config for this input?
my inputs.conf is :
[aws_s3://zx]
aws_account = yx
bucket_name = xy
character_set = auto
ct_blacklist = ^(?:Describe|List|Get)
host_name = s3.amazonaws.com
initial_scan_datetime = 2016-04-10T16:58:20+0200
key_name = devops/
max_items = 100000
max_retries = 3
polling_interval = 60
recursion_depth = -1
sourcetype = csv
ct_excluded_events_index =
index = data
disabled = 0