Splunk App for AWS: How to extract fields from mon...

Thomas_Aneiro · ‎06-08-2015

I am trying to use the Splunk for AWS app to monitor cisco web logs files but am having a very hard time extracting the fields. The files are tab separated values and are created approximately every half hour, so I should theoretically be able to use the header as the field values, but cannot do so while monitoring, only when manually uploading in a test environment. The problem I am finding with manual extraction is not every field is filled for all events. I attempted to use the following settings in the props.conf file but they are only pulling the header line.
Sample logs here : http://pastebin.com/5t7xjt41

[aws_s3]
FIELD_DELIMITER = tab
HEADER_FIELD_DELIMITER = tab
FIELD_NAMES = "datatime","c-ip","cs(X-Forwarded-For)","cs-username","cs-method","cs-uri-scheme","cs-host","cs-uri-port","cs-uri-path","cs-uri-query","cs(User-Agent)","cs(Content-Type)","cs-bytes","sc-bytes","sc-status","sc(Content-Type)","s-ip","x-ss-category","x-ss-last-rule-name","x-ss-last-rule-action","x-ss-block-type","x-ss-block-value","x-ss-external-ip","x-ss-referer-host"
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
disabled = false
BREAK_ONLY_BEFORE_DATE = true
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true

jeremyarcher · ‎06-08-2015

Thomas, did this come up because of the log data you're getting or because the data (proxy WC3 logs) are there but the field extraction is failing? When you mentioned Cisco proxy I'm assuming you're using Cisco CWS / Scansafe?

Almost all of the log data I get (using sourcetype = aws:s3) from Scansafe via AWS is:

#Fields: datatime   c-ip    cs(X-Forwarded-For) cs-username cs-method   cs-uri-scheme   cs-host cs-uri-port cs-uri-path cs-uri-query    cs(User-Agent)  cs(Content-Type)    cs-bytes    sc-bytes    sc-status   sc(Content-Type)    s-ip    x-ss-category   x-ss-last-rule-name x-ss-last-rule-action   x-ss-block-type x-ss-block-value    x-ss-external-ip    x-ss-referer-host

woodcock · ‎06-08-2015

I don't see much more than extra "useless" stuff but try this:

[aws_s3]
FIELD_DELIMITER = \t
HEADER_FIELD_DELIMITER = \t
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type = true

http://docs.splunk.com/Documentation/Splunk/6.2.3/Data/Extractfieldsfromfileheadersatindextime

Thomas_Aneiro · ‎06-08-2015

Thanks, but I have tried this, not sure why it isn't working. I ended up just using a regex extraction in the props.conf file.

EXTRACT-datatime,c_ip,cs_X_Forwarded_For,cs_username,cs_method,cs_uri_scheme,cs_host,cs_uri_port,cs_uri_path,cs_uri_query,cs_user_agent,cs_content_type,cs_bytes,sc_bytes,sc_status,sc_content_type,s_ip,x_ss_category,x_ss_last_rule_name,x_ss_last_rule_action,x_ss_block_type,x_ss_block_value,x_ss_external_ip,x_ss_referer_host = ^(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)$

Ledion_Bitincka · ‎06-08-2015

Are you using Splunk or Hunk? If you're using Splunk you need to place the configs (in your original question) on an indexer. What you've commented above does search time field extractions and lives in the search head and should work in Splunk and Hunk.

Splunk App for AWS: How to extract fields from monitored TSV files?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Observability Simplified: Combining User Experience, Application Performance & ...

Event Series May & June: From Network Visibility to Service Intelligence

Join the Conversation