All Apps and Add-ons

Splunk App for AWS: How to extract fields from monitored TSV files?

Thomas_Aneiro
Explorer

I am trying to use the Splunk for AWS app to monitor cisco web logs files but am having a very hard time extracting the fields. The files are tab separated values and are created approximately every half hour, so I should theoretically be able to use the header as the field values, but cannot do so while monitoring, only when manually uploading in a test environment. The problem I am finding with manual extraction is not every field is filled for all events. I attempted to use the following settings in the props.conf file but they are only pulling the header line.
Sample logs here : http://pastebin.com/5t7xjt41

[aws_s3]
FIELD_DELIMITER = tab
HEADER_FIELD_DELIMITER = tab
FIELD_NAMES = "datatime","c-ip","cs(X-Forwarded-For)","cs-username","cs-method","cs-uri-scheme","cs-host","cs-uri-port","cs-uri-path","cs-uri-query","cs(User-Agent)","cs(Content-Type)","cs-bytes","sc-bytes","sc-status","sc(Content-Type)","s-ip","x-ss-category","x-ss-last-rule-name","x-ss-last-rule-action","x-ss-block-type","x-ss-block-value","x-ss-external-ip","x-ss-referer-host"
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
disabled = false
BREAK_ONLY_BEFORE_DATE = true
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true
0 Karma

jeremyarcher
Path Finder

Thomas, did this come up because of the log data you're getting or because the data (proxy WC3 logs) are there but the field extraction is failing? When you mentioned Cisco proxy I'm assuming you're using Cisco CWS / Scansafe?

Almost all of the log data I get (using sourcetype = aws:s3) from Scansafe via AWS is:

#Fields: datatime   c-ip    cs(X-Forwarded-For) cs-username cs-method   cs-uri-scheme   cs-host cs-uri-port cs-uri-path cs-uri-query    cs(User-Agent)  cs(Content-Type)    cs-bytes    sc-bytes    sc-status   sc(Content-Type)    s-ip    x-ss-category   x-ss-last-rule-name x-ss-last-rule-action   x-ss-block-type x-ss-block-value    x-ss-external-ip    x-ss-referer-host
0 Karma

woodcock
Esteemed Legend

I don't see much more than extra "useless" stuff but try this:

[aws_s3]
FIELD_DELIMITER = \t
HEADER_FIELD_DELIMITER = \t
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type = true

http://docs.splunk.com/Documentation/Splunk/6.2.3/Data/Extractfieldsfromfileheadersatindextime

0 Karma

Thomas_Aneiro
Explorer

Thanks, but I have tried this, not sure why it isn't working. I ended up just using a regex extraction in the props.conf file.

EXTRACT-datatime,c_ip,cs_X_Forwarded_For,cs_username,cs_method,cs_uri_scheme,cs_host,cs_uri_port,cs_uri_path,cs_uri_query,cs_user_agent,cs_content_type,cs_bytes,sc_bytes,sc_status,sc_content_type,s_ip,x_ss_category,x_ss_last_rule_name,x_ss_last_rule_action,x_ss_block_type,x_ss_block_value,x_ss_external_ip,x_ss_referer_host = ^(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)$

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Are you using Splunk or Hunk? If you're using Splunk you need to place the configs (in your original question) on an indexer. What you've commented above does search time field extractions and lives in the search head and should work in Splunk and Hunk.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...