Getting Data In

No fields or timestamps extracted when indexing TSV from S3 bucket

gauldridge
Path Finder

I have a standalone Splunk Enterprise (not Splunk Cloud) set up to work with some log data that is stored in an AWS S3 bucket. The log data is in TSV format, each file has a header row at the top with the field names, and each file is gzipped. I have the AWS TA installed (https://splunkbase.splunk.com/app/1876).

Having followed the instructions in the documentation (Introduction to the Splunk Add-on for Amazon Web Services - Splunk Documentation) for setting up a Generic S3 input, no fields are being extracted and the time stamps are not being recognized. The data does ingest but it is all just raw rows from the TSVs. The header row is being indexed as an event as well. The timestamps in Splunk are just _indextime even though there is a column called "timestamp" in the data.

Does anyone have any suggestions on how I can get this to recognize the timestamps and actually show the field names that appear in the header row?

Labels (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Perhaps the generic S3 input is *too* generic.  Can you share the props.conf stanza for the appropriate sourcetype?

---
If this reply helps you, Karma would be appreciated.
0 Karma

gauldridge
Path Finder

I should also mention that changing the sourcetype to anything other than aws:s3 or aws:s3:csv results in no data being indexed at all.

0 Karma

gauldridge
Path Finder

Here is the props.conf stanza from the TA's default directory that applies to the source type that is specified in the documentation:

###########################
### CSV ###
###########################

[aws:s3:csv]
DATETIME_CONFIG = CURRENT
TIME_FORMAT = %Y-%m-%dT%H:%M:%S%Z
SHOULD_LINEMERGE = false
LINE_BREAKER = [\r\n]+
TRUNCATE = 8388608
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = [\r\n]+
KV_MODE = json

I tried adding a props.conf into the local directory for the TA but it seems to be ignored because the data ends up indexed exactly the same after adding the new file and then restarting Splunk. This is the contents of the local props.conf that I tried:

[aws:s3:csv]
TIME_FORMAT = %s
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = TSV
TIMESTAMP_FIELDS = timestamp

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The first set of props will not ingest a CSV properly.  The second should work much better.

In which Splunk instance did you make the change?  It should be done on the indexers and heavy forwarders (if you have them).

Use btool on an indexer to make sure the settings are as expected.

splunk btool --debug props list aws:s3:csv

The change will apply to new data only.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gauldridge
Path Finder

I am running a single instance (i.e. everything on one box).

I have updated the local props.conf again and seen no change in the indexed data.  Here is the current output from btool:

C:\Program Files\Splunk\bin>splunk btool --debug props list aws:s3:csv
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf [aws:s3:csv]
C:\Program Files\Splunk\etc\system\default\props.conf ADD_EXTRA_TIME_FIELDS = True
C:\Program Files\Splunk\etc\system\default\props.conf ANNOTATE_PUNCT = True
C:\Program Files\Splunk\etc\system\default\props.conf AUTO_KV_JSON = true
C:\Program Files\Splunk\etc\system\default\props.conf BREAK_ONLY_BEFORE =
C:\Program Files\Splunk\etc\system\default\props.conf BREAK_ONLY_BEFORE_DATE = True
C:\Program Files\Splunk\etc\system\default\props.conf CHARSET = AUTO
C:\Program Files\Splunk\etc\apps\Splunk_TA_aws\default\props.conf DATETIME_CONFIG = CURRENT
C:\Program Files\Splunk\etc\system\default\props.conf DEPTH_LIMIT = 1000
C:\Program Files\Splunk\etc\system\default\props.conf DETERMINE_TIMESTAMP_DATE_WITH_SYSTEM_TIME = false
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf EVENT_BREAKER = [\r\n]+
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf EVENT_BREAKER_ENABLE = true
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf FIELD_DELIMITER = \t
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf HEADER_FIELD_DELIMITER = \t
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf HEADER_FIELD_LINE_NUMBER = 1
C:\Program Files\Splunk\etc\system\default\props.conf HEADER_MODE =
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf INDEXED_EXTRACTIONS = TSV
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf KV_MODE = multi
C:\Program Files\Splunk\etc\system\default\props.conf LB_CHUNK_BREAKER_TRUNCATE = 2000000
C:\Program Files\Splunk\etc\system\default\props.conf LEARN_MODEL = true
C:\Program Files\Splunk\etc\system\default\props.conf LEARN_SOURCETYPE = true
C:\Program Files\Splunk\etc\apps\Splunk_TA_aws\default\props.conf LINE_BREAKER = [\r\n]+
C:\Program Files\Splunk\etc\system\default\props.conf LINE_BREAKER_LOOKBEHIND = 100
C:\Program Files\Splunk\etc\system\default\props.conf MATCH_LIMIT = 100000
C:\Program Files\Splunk\etc\system\default\props.conf MAX_DAYS_AGO = 2000
C:\Program Files\Splunk\etc\system\default\props.conf MAX_DAYS_HENCE = 2
C:\Program Files\Splunk\etc\system\default\props.conf MAX_DIFF_SECS_AGO = 3600
C:\Program Files\Splunk\etc\system\default\props.conf MAX_DIFF_SECS_HENCE = 604800
C:\Program Files\Splunk\etc\system\default\props.conf MAX_EVENTS = 256
C:\Program Files\Splunk\etc\system\default\props.conf MAX_TIMESTAMP_LOOKAHEAD = 128
C:\Program Files\Splunk\etc\system\default\props.conf MUST_BREAK_AFTER =
C:\Program Files\Splunk\etc\system\default\props.conf MUST_NOT_BREAK_AFTER =
C:\Program Files\Splunk\etc\system\default\props.conf MUST_NOT_BREAK_BEFORE =
C:\Program Files\Splunk\etc\system\default\props.conf SEGMENTATION = indexing
C:\Program Files\Splunk\etc\system\default\props.conf SEGMENTATION-all = full
C:\Program Files\Splunk\etc\system\default\props.conf SEGMENTATION-inner = inner
C:\Program Files\Splunk\etc\system\default\props.conf SEGMENTATION-outer = outer
C:\Program Files\Splunk\etc\system\default\props.conf SEGMENTATION-raw = none
C:\Program Files\Splunk\etc\system\default\props.conf SEGMENTATION-standard = standard
C:\Program Files\Splunk\etc\apps\Splunk_TA_aws\default\props.conf SHOULD_LINEMERGE = false
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf TIMESTAMP_FIELDS = timestamp
C:\Program Files\Splunk\etc\apps\new_app_for_s3_data\local\props.conf TIME_FORMAT = %s
C:\Program Files\Splunk\etc\system\default\props.conf TRANSFORMS =
C:\Program Files\Splunk\etc\apps\Splunk_TA_aws\default\props.conf TRUNCATE = 8388608
C:\Program Files\Splunk\etc\system\default\props.conf detect_trailing_nulls = auto
C:\Program Files\Splunk\etc\system\default\props.conf maxDist = 100
C:\Program Files\Splunk\etc\system\default\props.conf priority =
C:\Program Files\Splunk\etc\system\default\props.conf sourcetype =
C:\Program Files\Splunk\etc\system\default\props.conf termFrequencyWeightedDist = false
C:\Program Files\Splunk\etc\system\default\props.conf unarchive_cmd_start_mode = shell

 

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...