Getting Data In

advice for when you have more than 100 automatically extracted fields?

SplunkTrust
SplunkTrust

It seems that if you have a lot of fields being extracted automatically, like via INDEXED_EXTRACTIONS=csv or via automatic kv extraction, beyond any fields that are explicitly mentioned in your search, Splunk 6.0 will only allow itself to automatically extract about 100 more fields.

This really prevents certain commands like fieldsummary or transpose from working properly.

Basically if you have more than 100 fields, you don't know what their names are, and you need to get that list of field names or values in Splunk, you wont be able to using the Splunk search language. Whatever search language you use - fieldsummary or stats first(*) as * | transpose, Splunk will ignore some of your extraction rules each time and you'll always end up with an incomplete list of fields. Which one it ignores seems somewhat random, so certain fields will be appearing and dissappearing from your results over time.

Does anyone know if anything can be set in limits.conf or in the search language to override this behavior?

I believe it is related to these other posts
http://answers.splunk.com/answers/82252/how-many-field-extract-in-splunk
http://answers.splunk.com/answers/117884/fields-not-automatically-extracting

Also, I've already tried setting maxcols in the [kv] stanza in limits.conf and it has no effect here - probably because that key only affects keys being generated by kv and autokv configurations, and this is generated via INDEXED_EXTRACTIONS=csv

There is one very limited workaround that I have found - if you mention all the fields in your search explicitly somehow, like with an enormous fields clause, this forces Splunk to extract them, no matter how many of them there are. However that doesn't help situations where you really don't know in advance or with certainty what they will be.

In my particular case there is a csv sourcetype where we don't know in advance what the fields are going to be. But we need the field list and using the search language itself to get it is the only way we have. Depending on various factors there are hundreds of fields that might be present in the sourcetype, and on any one customer environment there will be about 100 - 140 fields present.

1 Solution

Engager

I am using Splunk 6.1.4. I had the same issue, tried modifying maxcols, and it did not work for me.

So I took a peek at $SPLUNK_HOME/etc/system/default/limits.conf and observed a couple more parameters to change.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to contain the following:

[kv]
# when non-zero, the point at which kv should stop creating new columns
maxcols  = 1800
# maximum number of keys auto kv can generate
limit    = 500
# truncate _raw to to this size and then do auto KV
maxchars = 102400

That seems to have worked for me.

View solution in original post

Engager

I came across this issue as well.
On 6.2 our default/limits.conf had the limit set to 50, we increased this in our local/limits.conf to 150 but after a splunk restart that didn't seem to have fixed it.

As a work around I used | fields -* | extract pairdelim=",", kvdelim="=", auto=f, limit=200, mv_add=t to drop the automatically extracted fields and re-extract them all. This also has the advantage that Multiple fields are extracted properly rather than only the 1st found occurrence.

Contributor

One always learns. This is an amazing command. Modifing limits.conf under [kv] with limit = 300 helped my on my all-in-one lab box. Cannot make it work on my SH-IDX deployment. For now this helped me a lot! Thx.

0 Karma

Engager

I am using Splunk 6.1.4. I had the same issue, tried modifying maxcols, and it did not work for me.

So I took a peek at $SPLUNK_HOME/etc/system/default/limits.conf and observed a couple more parameters to change.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to contain the following:

[kv]
# when non-zero, the point at which kv should stop creating new columns
maxcols  = 1800
# maximum number of keys auto kv can generate
limit    = 500
# truncate _raw to to this size and then do auto KV
maxchars = 102400

That seems to have worked for me.

View solution in original post

Motivator

+1 for you.

Despite being old, this answer is still quickly found through Google and addressed my problem. The alternative option that I used based on this answer is to explicitly table out all expected fields instead.

So, with a limit of 100, rare_field will not show up with:

| index=yourindex

Even if you change "Interesting Fields" to "All Fields" (instead of >1% coverage). However, this search will return it where applicable:

| index=yourindex
| table rare_field

If you can identify all of your important fields, this will also work without changing limits.conf

Cheers,
Jacob

Cheers,
Jacob
0 Karma

SplunkTrust
SplunkTrust

Bugger....

0 Karma

SplunkTrust
SplunkTrust

Good point. I forgot to mention that, but no they have no effect on this problem.

0 Karma

SplunkTrust
SplunkTrust

For completeness' sake, does fields * or fields + * do anything in this case?