Scripted Input field extractions not working like ...

anewell · ‎07-19-2012

I have a use-case that requires a scripted input. I have built a scripted input app following the docs, but I'm having trouble getting the extractions to work properly. The data is indexed, but the fields are not extracted. The full dataset is roughly 100 lines. I'm using the pipe delimiter to hew closely to the example in the docs. My script emits a header line that is easily removed if not needed.

Frustratingly, if I format the data as CSV file with a header line and comma delimiters, I can upload it as a one-shot, and Splunk will automatically recognize it and extract the fields as desired. My goal is for my scripted input to get the same extractions that a one-shot CSV upload gets.

I'm probably missing something simple, but I'm not seeing it:

inputs.conf
    [script://./bin/app.sh]
    disabled = false
    host = Servername
    index = devtest
    interval = 60
    source = app_users
    sourcetype = app_user_data

props.conf
    [app_user_data]
    SHOULD_LINEMERGE = false
    TRANSFORMS-app_users = app_extractions

transforms.conf
    [app_extractions]
    DELIMS = "|"
    FIELDS = "environment","user_name","full_name","login_time","last_touched","ip_address"

Sanitized output of the scripted input file "app.sh":

Environment|User Name|Full Name|Login Time|Last Touched|IP Address
production|alice_t|Alice Toklas|7/18/2012 10:05:30 PM|7/19/2012 11:51:12 AM|10.20.30.40
QA|bob_y|Bob Yeruncle|7/19/2012 4:58:14 AM|7/19/2012 11:48:56 AM|192.168.1.3

Platform is Splunk 4.3.3 running on CentOS 5.8, and my app is running locally on the indexer.

anewell · ‎07-20-2012

(moving from comment stream to a new answer for space and formatting)

GK - Confirmed, I am using REPORT, and the sourcetype is app_user_data. Looking at the learned transforms, I can see the generated config statement puts spaces between the fields entries ("1", "2", ) where I did not have spaces ("1","2",). Just in case, I have edited to match, but that did not help. I've also added a few lines to props.conf per the learned file, and adjusted the props stanza from sourcetype to source based on my reading of the spec file. Currently, my configs are:

Inputs.conf
    [script://./bin/app.sh]
    disabled = false
    host = Servername
    index = apptest
    interval = 60
    source = app_users
    sourcetype = app_user_data

props.conf
    [source::app_users]
    KV_MODE = none
    SHOULD_LINEMERGE = false
    REPORT-app_users = app_extractions
    pulldown_type = true

transforms.conf 
    [app_extractions]
    DELIMS = "|"
    FIELDS = "Environment", "User Name", "Full Name", "Login Time", "Last Touched", "IP Address"

I've been relying on the scripted input to stream data into Splunk, and I've been using the "|" as a delimiter. My next approach will be to edit my script to output a true CSV file and then have splunk consume that file. Not as elegant, alas.

Ayn · ‎07-19-2012

You should not be using index-time extractions (TRANSFORMS) unless you know very well what you are doing and have a very good reason for doing it.

Use search-time extractions (REPORT) whenever possible. Simply switching TRANSFORMS for REPORT in props.conf will do the trick in your scenario.

gkanapathy · ‎07-19-2012

Looks right to me as well, provided you use REPORT and not TRANSFORMS (TRANSFORMS simply won't work with DELIMS/FIELDS). I suppose you can check inside $SPLUNK_HOME/etc/learned/local/ to see what props.conf and transforms.conf entries were generated for csv-8 and compare with what you have.

Also, can you please confirm that you scripted input is indexed with the correct specified sourcetype app_user_data?

anewell · ‎07-19-2012

PS - Ayn, I'm using your Squid and Snort apps.. Thanks for writing them!

anewell · ‎07-19-2012

Sourcetype 'app_user_data' shows only fields host, source, sourcetype. Versus when I one-shot a csv, I get sourcetype csv-8, with fields for the columns as defined in the header line.
re: indexer, I slightly misspoke. I only meant to indicate data was not coming from a forwarder. Thanks Again.

Ayn · ‎07-19-2012

One thing though, you mention that your app is "running locally on the indexer" - are you performing your searches directly on the indexer or do you have a separate Splunk instance acting as a search head? In the latter case, your field extractions should go on the search head, not the indexer.

Ayn · ‎07-19-2012

So you're getting data with sourcetype app_user_data, but you're not seeing fields like "environment" and "user_name" when you view that data?

Your setup looks OK to me, provided that the sourcetype stanza is matching correctly.

anewell · ‎07-19-2012

Hmm.. Thank you for the clarification, in this case it does not fully solve the problem.

[app_user_data]
SHOULD_LINEMERGE = false
REPORT-app_users = app_extractions
(Edited props, stopped splunk, cleaned event data, started)

Scripted Input field extractions not working like a csv one-shot

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

Scripted Input field extractions not working like a csv one-shot

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...