I have a use-case that requires a scripted input. I have built a scripted input app following the docs, but I'm having trouble getting the extractions to work properly. The data is indexed, but the fields are not extracted. The full dataset is roughly 100 lines. I'm using the pipe delimiter to hew closely to the example in the docs. My script emits a header line that is easily removed if not needed.
Frustratingly, if I format the data as CSV file with a header line and comma delimiters, I can upload it as a one-shot, and Splunk will automatically recognize it and extract the fields as desired. My goal is for my scripted input to get the same extractions that a one-shot CSV upload gets.
I'm probably missing something simple, but I'm not seeing it:
inputs.conf
[script://./bin/app.sh]
disabled = false
host = Servername
index = devtest
interval = 60
source = app_users
sourcetype = app_user_data
props.conf
[app_user_data]
SHOULD_LINEMERGE = false
TRANSFORMS-app_users = app_extractions
transforms.conf
[app_extractions]
DELIMS = "|"
FIELDS = "environment","user_name","full_name","login_time","last_touched","ip_address"
Sanitized output of the scripted input file "app.sh":
Environment|User Name|Full Name|Login Time|Last Touched|IP Address
production|alice_t|Alice Toklas|7/18/2012 10:05:30 PM|7/19/2012 11:51:12 AM|10.20.30.40
QA|bob_y|Bob Yeruncle|7/19/2012 4:58:14 AM|7/19/2012 11:48:56 AM|192.168.1.3
Platform is Splunk 4.3.3 running on CentOS 5.8, and my app is running locally on the indexer.
(moving from comment stream to a new answer for space and formatting)
GK - Confirmed, I am using REPORT, and the sourcetype is app_user_data. Looking at the learned transforms, I can see the generated config statement puts spaces between the fields entries ("1", "2", ) where I did not have spaces ("1","2",). Just in case, I have edited to match, but that did not help. I've also added a few lines to props.conf per the learned file, and adjusted the props stanza from sourcetype to source based on my reading of the spec file. Currently, my configs are:
Inputs.conf
[script://./bin/app.sh]
disabled = false
host = Servername
index = apptest
interval = 60
source = app_users
sourcetype = app_user_data
props.conf
[source::app_users]
KV_MODE = none
SHOULD_LINEMERGE = false
REPORT-app_users = app_extractions
pulldown_type = true
transforms.conf
[app_extractions]
DELIMS = "|"
FIELDS = "Environment", "User Name", "Full Name", "Login Time", "Last Touched", "IP Address"
I've been relying on the scripted input to stream data into Splunk, and I've been using the "|" as a delimiter. My next approach will be to edit my script to output a true CSV file and then have splunk consume that file. Not as elegant, alas.
You should not be using index-time extractions (TRANSFORMS) unless you know very well what you are doing and have a very good reason for doing it.
Use search-time extractions (REPORT) whenever possible. Simply switching TRANSFORMS for REPORT in props.conf will do the trick in your scenario.
Looks right to me as well, provided you use REPORT and not TRANSFORMS (TRANSFORMS simply won't work with DELIMS/FIELDS). I suppose you can check inside $SPLUNK_HOME/etc/learned/local/ to see what props.conf and transforms.conf entries were generated for csv-8 and compare with what you have.
Also, can you please confirm that you scripted input is indexed with the correct specified sourcetype app_user_data
?
PS - Ayn, I'm using your Squid and Snort apps.. Thanks for writing them!
Sourcetype 'app_user_data' shows only fields host, source, sourcetype. Versus when I one-shot a csv, I get sourcetype csv-8, with fields for the columns as defined in the header line.
re: indexer, I slightly misspoke. I only meant to indicate data was not coming from a forwarder. Thanks Again.
One thing though, you mention that your app is "running locally on the indexer" - are you performing your searches directly on the indexer or do you have a separate Splunk instance acting as a search head? In the latter case, your field extractions should go on the search head, not the indexer.
So you're getting data with sourcetype app_user_data
, but you're not seeing fields like "environment" and "user_name" when you view that data?
Your setup looks OK to me, provided that the sourcetype stanza is matching correctly.
Hmm.. Thank you for the clarification, in this case it does not fully solve the problem.
[app_user_data]
SHOULD_LINEMERGE = false
REPORT-app_users = app_extractions
(Edited props, stopped splunk, cleaned event data, started)