Splunk Search

Scripted Input field extractions not working like a csv one-shot

anewell
Path Finder

I have a use-case that requires a scripted input. I have built a scripted input app following the docs, but I'm having trouble getting the extractions to work properly. The data is indexed, but the fields are not extracted. The full dataset is roughly 100 lines. I'm using the pipe delimiter to hew closely to the example in the docs. My script emits a header line that is easily removed if not needed.

Frustratingly, if I format the data as CSV file with a header line and comma delimiters, I can upload it as a one-shot, and Splunk will automatically recognize it and extract the fields as desired. My goal is for my scripted input to get the same extractions that a one-shot CSV upload gets.

I'm probably missing something simple, but I'm not seeing it:

inputs.conf
    [script://./bin/app.sh]
    disabled = false
    host = Servername
    index = devtest
    interval = 60
    source = app_users
    sourcetype = app_user_data

props.conf
    [app_user_data]
    SHOULD_LINEMERGE = false
    TRANSFORMS-app_users = app_extractions

transforms.conf
    [app_extractions]
    DELIMS = "|"
    FIELDS = "environment","user_name","full_name","login_time","last_touched","ip_address"

Sanitized output of the scripted input file "app.sh":

Environment|User Name|Full Name|Login Time|Last Touched|IP Address
production|alice_t|Alice Toklas|7/18/2012 10:05:30 PM|7/19/2012 11:51:12 AM|10.20.30.40
QA|bob_y|Bob Yeruncle|7/19/2012 4:58:14 AM|7/19/2012 11:48:56 AM|192.168.1.3

Platform is Splunk 4.3.3 running on CentOS 5.8, and my app is running locally on the indexer.

anewell
Path Finder

(moving from comment stream to a new answer for space and formatting)

GK - Confirmed, I am using REPORT, and the sourcetype is app_user_data. Looking at the learned transforms, I can see the generated config statement puts spaces between the fields entries ("1", "2", ) where I did not have spaces ("1","2",). Just in case, I have edited to match, but that did not help. I've also added a few lines to props.conf per the learned file, and adjusted the props stanza from sourcetype to source based on my reading of the spec file. Currently, my configs are:

Inputs.conf
    [script://./bin/app.sh]
    disabled = false
    host = Servername
    index = apptest
    interval = 60
    source = app_users
    sourcetype = app_user_data

props.conf
    [source::app_users]
    KV_MODE = none
    SHOULD_LINEMERGE = false
    REPORT-app_users = app_extractions
    pulldown_type = true

transforms.conf 
    [app_extractions]
    DELIMS = "|"
    FIELDS = "Environment", "User Name", "Full Name", "Login Time", "Last Touched", "IP Address"

I've been relying on the scripted input to stream data into Splunk, and I've been using the "|" as a delimiter. My next approach will be to edit my script to output a true CSV file and then have splunk consume that file. Not as elegant, alas.

Ayn
Legend

You should not be using index-time extractions (TRANSFORMS) unless you know very well what you are doing and have a very good reason for doing it.

Use search-time extractions (REPORT) whenever possible. Simply switching TRANSFORMS for REPORT in props.conf will do the trick in your scenario.

gkanapathy
Splunk Employee
Splunk Employee

Looks right to me as well, provided you use REPORT and not TRANSFORMS (TRANSFORMS simply won't work with DELIMS/FIELDS). I suppose you can check inside $SPLUNK_HOME/etc/learned/local/ to see what props.conf and transforms.conf entries were generated for csv-8 and compare with what you have.

Also, can you please confirm that you scripted input is indexed with the correct specified sourcetype app_user_data?

anewell
Path Finder

PS - Ayn, I'm using your Squid and Snort apps.. Thanks for writing them!

anewell
Path Finder

Sourcetype 'app_user_data' shows only fields host, source, sourcetype. Versus when I one-shot a csv, I get sourcetype csv-8, with fields for the columns as defined in the header line.
re: indexer, I slightly misspoke. I only meant to indicate data was not coming from a forwarder. Thanks Again.

Ayn
Legend

One thing though, you mention that your app is "running locally on the indexer" - are you performing your searches directly on the indexer or do you have a separate Splunk instance acting as a search head? In the latter case, your field extractions should go on the search head, not the indexer.

Ayn
Legend

So you're getting data with sourcetype app_user_data, but you're not seeing fields like "environment" and "user_name" when you view that data?

Your setup looks OK to me, provided that the sourcetype stanza is matching correctly.

anewell
Path Finder

Hmm.. Thank you for the clarification, in this case it does not fully solve the problem.

[app_user_data]
SHOULD_LINEMERGE = false
REPORT-app_users = app_extractions
(Edited props, stopped splunk, cleaned event data, started)

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...