Solved: Splunk insisting on Auto-finding CSV fields?

travispowell · ‎04-28-2011

Hello,

Splunk is insisting on trying to auto-find headers in a tab-delimited CSV file for which I have manually defined headers in a CONF file. I thought that putting this information in /etc/system/local would override the /etc/apps/learned/ but it doesn't look like that's the case...

Here are my CONF files for system-local:

inputs.conf

[monitor:///logs/strauss_splunk/bulksession]
sourcetype=csv
source=strauss_sessions
index=strauss_sessions
host=WHITNEY

[monitor:///logs/strauss_splunk/bulkurl]
sourcetype=csv
source=strauss_url
index=strauss_url
host=WHITNEY

[monitor:///inetpub/strauss_splunk/bulkhit]
sourcetype=csv
source=strauss_hits
index=strauss_hits
host=WHITNEY

props.conf

[source::strauss_url]
SHOULD_LINEMERGE=false
CHECK_FOR_HEADER=false
TRANSFORMS-STRAUSSTSV=STRAUSSTSV-1

[source::strauss_sessions]
SHOULD_LINEMERGE=false
CHECK_FOR_HEADER=false
TRANSFORMS-STRAUSSTSV=STRAUSSTSV-2

[source::strauss_hits]
SHOULD_LINEMERGE=false
CHECK_FOR_HEADER=false
TRANSFORMS-STRAUSSTSV=STRAUSSTSV-3

transforms.conf

[STRAUSSTSV-3]
DELIMS = "  "
FIELDS = "SESSION_KEY", "HIT_KEY", "ID", "SECURE"

[STRAUSSTSV-2]
DELIMS = "  "
FIELDS = "SESSION_KEY", "ADDRESS", "CANISTER"

[STRAUSSTSV-1]
DELIMS = "  "
FIELDS = "SESSION_KEY", "HIT_KEY", "NAME", "VALUE", "TIMESTAMP"

* * * * * * * * * * * * * * * * * * * * * * * *

...and this all looks good, right? But... this is what the /system/learned/ CONF files populate as afterwards:

* * * * * * * * * * * * * * * * * * * * * * * *

props.conf

[csv-2]
KV_MODE = none
REPORT-AutoHeader = AutoHeader-1
SHOULD_LINEMERGE = False
given_type = csv
pulldown_type = true

[csv-3]
KV_MODE = none
REPORT-AutoHeader = AutoHeader-2
SHOULD_LINEMERGE = False
given_type = csv
pulldown_type = true

[csv-4]
KV_MODE = none
REPORT-AutoHeader = AutoHeader-3
SHOULD_LINEMERGE = False
given_type = csv
pulldown_type = true

[csv-5]
KV_MODE = none
REPORT-AutoHeader = AutoHeader-4
SHOULD_LINEMERGE = False
given_type = csv
pulldown_type = true

[csv-6]
KV_MODE = none
REPORT-AutoHeader = AutoHeader-5
SHOULD_LINEMERGE = False
given_type = csv
pulldown_type = true95

transforms.conf

[AutoHeader-1]
DELIMS = "  "
FIELDS = "58b0c3f3c517dd9ee90cf256800dae98", "27eec7a8d949b8afe03a11b47604633b", "B99004EA29E4FA783416FAC3F7AB87A5", "N"

[AutoHeader-2]
DELIMS = "  "
FIELDS = "0f4c4f0c76bb2898ccbcfa816cfbe49b", "cbb2438acf31acf8acefacb3ff2b59a9", "EA9AE62469C9E2DCE926B32A545675EA", "N"

[AutoHeader-3]
DELIMS = "  "
FIELDS = "8ed97b717ce4b20e561a9c5a033f925c", "44bdc568a0b0052fd2232239257ebc6c", "897FFA5C1C1F078517B1FF8DB392AC54", "Y"

[AutoHeader-4]
DELIMS = "  "
FIELDS = "58b0c3f3c517dd9ee90cf256800dae98", "63.194.158.158", "LSSN_20110419_WHITNEY.dat"

[AutoHeader-5]
DELIMS = "  "
FIELDS = "becf1cd6433bd8ddf2a3f4e9da3fe133", "20c184e3ce2396fbda9d5071c8b3344d", "login_username", "XXXXXXXXXXXXXXXX", "2011-04-19 07:00:20.000"

Any ideas??

Thank you so much

travispowell · ‎04-28-2011

Solved: I capitulated. Don't fight the beast. I ended up saying screw-it, splunk, you can auto-extract field names for me. But I wrote a REGEX rule that pointed to nullQueue to remove the first line.

See here:

http://www.splunk.com/support/forum:SplunkAdministration/4081

View solution in original post

travispowell · ‎04-28-2011

Solved: I capitulated. Don't fight the beast. I ended up saying screw-it, splunk, you can auto-extract field names for me. But I wrote a REGEX rule that pointed to nullQueue to remove the first line.

See here:

http://www.splunk.com/support/forum:SplunkAdministration/4081

JSapienza · ‎04-28-2011

You might want to add this to your props.conf

http://www.splunk.com/base/Documentation/latest/admin/Propsconf

LEARN_SOURCETYPE = [true|false]

Determines whether learning of known or unknown sourcetypes is enabled. * For known sourcetypes, refer to LEARN_MODEL. * For unknown sourcetypes, refer to the rule:: and delayedrule:: configuration (see below).
Setting this field to false disables CHECK_FOR_HEADER as well (see above).
Defaults to true.

gkanapathy · ‎04-28-2011

You would need to clean out the etc/apps/learned/props.conf file, and reindex the data.

travispowell · ‎04-28-2011

This is a test box, because I have about 10+ GB /day of this stuff to index, (short halflife) and I'm cleaning the index every time. (> splunk clean eventdata) I got it to work with auto field extraction by inserting my own header line, but the issue there is that the header line is included in the count, and if I have 27,804 events I don't want it to say 27,805.

JSapienza · ‎04-28-2011

I think changes at this point would only apply to new event's and not events already indexed.

travispowell · ‎04-28-2011

Thanks for the suggestions, but neither of those actually fixed it. The learn_sourcetype modifier did stop Splunk from trying to auto-define fields, but it didn't let my CONF files take over...

JSapienza · ‎04-28-2011

Light bulb went off when I re-read your question. You will need to use DELIMS = "\t" for tab and not " "

travispowell · ‎04-28-2011

Thanks, haven't got it to work yet, but I'll keep investigating. I think I might have to change some other things around relating to the source types.

Splunk insisting on Auto-finding CSV fields?

Community Content Calendar, November Edition

October Community Champions: A Shoutout to Our Contributors!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Are you a member of the Splunk Community?

Splunk insisting on Auto-finding CSV fields?

Community Content Calendar, November Edition

October Community Champions: A Shoutout to Our Contributors!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!