All Apps and Add-ons

Splunk Add-on for Amazon Web Services: How to get a CSV file stored in Amazon S3 to properly split at index-time?

jpvlsmv
Path Finder

I'm having trouble getting a CSV file that I've stored in Amazon S3 to properly split at index-time.

I'm using the Splunk Add-on for AWS, which allows me to define an S3 bucket to monitor. It pulls the data down just fine when a new CSV is uploaded:

[aws_s3://s3_autoruns]
disabled = false
aws_account = Splunk Reader
bucket_name = mybucket
index = jm
initial_scan_datetime = default
interval = 30
max_items = 100000
max_retries = 10
recursion_depth = 3
sourcetype = s3_autoruns
whitelist = .*/autoruns.txt$
blacklist = .*
character_set = UTF-16LE

I have in my props.conf a working transform (which changes the Host field to part of the S3 url), so I know this stanza is hitting for this data.

[source::.../autoruns.txt]
TRANSFORMS-s3host = transform-s3-integhost
DATETIME_CONFIG=CURRENT

With this, I get an event per line of the file.

I think I should be able to add to my props.conf:

INDEXED_EXTRACTIONS=CSV
FIELD_NAMES=Time,EntryLocation,Entry,Enabled,Category,Description,Publisher,ImagePath,LaunchString,MD5,SHA-1,SHA-256
FIELD_DELIMITER=,

But when I do that, it does not change anything. I still get one event per line, and no EntryLocation field to search on.

Any thoughts?

Thanks,
--Joe

dmaislin_splunk
Splunk Employee
Splunk Employee

I have run into this similar issue when streaming data via scripted input into Splunk. In the interim, please use the DELIMS option for search time field extractions:

http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/transformsconf

jpvlsmv
Path Finder

If I mirror the S3 bucket to a local directory and monitor it, it splits nicely:
[monitor:///data]
disabled = 0
crcSalt = <SOURCE>
index = jm
sourcetype = s3_autoruns
whitelist = .*/autoruns.txt$

--Joe

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...