All Apps and Add-ons

Splunk App for AWS: How to extract fields from monitored TSV files?

Thomas_Aneiro
Explorer

I am trying to use the Splunk for AWS app to monitor cisco web logs files but am having a very hard time extracting the fields. The files are tab separated values and are created approximately every half hour, so I should theoretically be able to use the header as the field values, but cannot do so while monitoring, only when manually uploading in a test environment. The problem I am finding with manual extraction is not every field is filled for all events. I attempted to use the following settings in the props.conf file but they are only pulling the header line.
Sample logs here : http://pastebin.com/5t7xjt41

[aws_s3]
FIELD_DELIMITER = tab
HEADER_FIELD_DELIMITER = tab
FIELD_NAMES = "datatime","c-ip","cs(X-Forwarded-For)","cs-username","cs-method","cs-uri-scheme","cs-host","cs-uri-port","cs-uri-path","cs-uri-query","cs(User-Agent)","cs(Content-Type)","cs-bytes","sc-bytes","sc-status","sc(Content-Type)","s-ip","x-ss-category","x-ss-last-rule-name","x-ss-last-rule-action","x-ss-block-type","x-ss-block-value","x-ss-external-ip","x-ss-referer-host"
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
disabled = false
BREAK_ONLY_BEFORE_DATE = true
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true
0 Karma

jeremyarcher
Path Finder

Thomas, did this come up because of the log data you're getting or because the data (proxy WC3 logs) are there but the field extraction is failing? When you mentioned Cisco proxy I'm assuming you're using Cisco CWS / Scansafe?

Almost all of the log data I get (using sourcetype = aws:s3) from Scansafe via AWS is:

#Fields: datatime   c-ip    cs(X-Forwarded-For) cs-username cs-method   cs-uri-scheme   cs-host cs-uri-port cs-uri-path cs-uri-query    cs(User-Agent)  cs(Content-Type)    cs-bytes    sc-bytes    sc-status   sc(Content-Type)    s-ip    x-ss-category   x-ss-last-rule-name x-ss-last-rule-action   x-ss-block-type x-ss-block-value    x-ss-external-ip    x-ss-referer-host
0 Karma

woodcock
Esteemed Legend

I don't see much more than extra "useless" stuff but try this:

[aws_s3]
FIELD_DELIMITER = \t
HEADER_FIELD_DELIMITER = \t
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type = true

http://docs.splunk.com/Documentation/Splunk/6.2.3/Data/Extractfieldsfromfileheadersatindextime

0 Karma

Thomas_Aneiro
Explorer

Thanks, but I have tried this, not sure why it isn't working. I ended up just using a regex extraction in the props.conf file.

EXTRACT-datatime,c_ip,cs_X_Forwarded_For,cs_username,cs_method,cs_uri_scheme,cs_host,cs_uri_port,cs_uri_path,cs_uri_query,cs_user_agent,cs_content_type,cs_bytes,sc_bytes,sc_status,sc_content_type,s_ip,x_ss_category,x_ss_last_rule_name,x_ss_last_rule_action,x_ss_block_type,x_ss_block_value,x_ss_external_ip,x_ss_referer_host = ^(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)$

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Are you using Splunk or Hunk? If you're using Splunk you need to place the configs (in your original question) on an indexer. What you've commented above does search time field extractions and lives in the search head and should work in Splunk and Hunk.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!