Solved: Why are not all field values are extracted for lon...

wu_weidong · ‎11-03-2021

Hi,

I am trying to ingest long JSON files into my Splunk index, where a record could contain more than 10000 characters. To prevent long records from getting truncated, I added a "TRUNCATE=0" into my props.conf, and the entire record was ingested into the index. All events are forwarded and stored in the index, but I'm having problems with fields that appear towards the end of the JSON records.

I'm currently testing with 2 files:

File A has 382 records, of which 166 are long records.
File B has 252 records, of which all are long records.

All 634 events are returned with a simple search of the index, and I can see all fields in each event, regardless of how long the event is.

However, not all fields are extracted and directly searchable. For example, one of the fields is called "name", and it appears towards the end of each JSON record. On the "Interesting fields" pane, under "name", it shows only a count of 216 events from File A, and none of the remaining 166 + 252 long events in Files A and B. This is the same for other fields that appear towards the end of each JSON record, but fields towards the beginning of the record show all 634 events.

If I negate the 216 events, then these fields do not appear on the Fields pane at all.

Also, while I'm not able to directly search for "name=<name in File B>", I can still select the field from the event and "add to search", and all 252 events would be returned.

I'm not sure why these fields are not properly extracted even though they did not appear to be truncated. How can I extract them properly?

Thank you.

wu_weidong · ‎11-05-2021

Thanks for the suggestions! While the 3 posts didn't specifically solve my problem, they did lead me to look at the settings in limits.conf (post here ), and I was able to extract all fields from my long JSON records by changing some of the settings.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to

[kv]
limit = 0
indexed_kv_limit = 0
maxcols = 100000
maxchars = 1500000

Now I'm wondering if there's any issue with setting such high values.

View solution in original post

cssmdi · ‎05-23-2022

Hi all
We have a similar problem. We read k8s-logs coming from fluentd and HEC into splunk. There is a message-field in the json, which can be a very long string. Using rex, it is possible to extract the field form json, but without this message and all following fields in _raw stay undefined (isnull(...) is true).

I tested several settings, including /opt/splunk/etc/system/local/limits.conf with the following content:

[realtime]
indexed_realtime_use_by_default = true

[spath]
extract_all = true
#number of characters to read from an XML or JSON event when auto extracting
extraction_cutoff = 50000

[kv]
maxchars = 1500000
limit = 0
indexed_kv_limit = 0
maxcols = 100000

[rex]
match_limit = 500000

Any idea how to solve this?

Thanks

Matthias

isoutamo · ‎11-03-2021

Hi

did these help you?

There are some limits to amount of records etc. Just check those and if needed proper configuration file documentation too.

r. Ismo

wu_weidong · ‎11-05-2021

Thanks for the suggestions! While the 3 posts didn't specifically solve my problem, they did lead me to look at the settings in limits.conf (post here ), and I was able to extract all fields from my long JSON records by changing some of the settings.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to

[kv]
limit = 0
indexed_kv_limit = 0
maxcols = 100000
maxchars = 1500000

Now I'm wondering if there's any issue with setting such high values.

hettervik · ‎10-04-2024

We also had some inconsistencies with these field extractions. Figured out that we needed to push the new limits configuration to the indexers, as well as the search head. Only pushing to the search head will work if you have a centralizing command before the spath field extraction, but not for streaming field extractions.

verbal_666 · ‎04-21-2024

Hi.
limits.conf on Indexers or simple on SearchHead(s)? Or better both?

EDIT: better on Indexers side, since

limit =

is for SearchTime from SH to Indexer peer, and 100 is default limit 👍

I also had this "problem" with a ~150 fields JSON, and a simple,

[kv]
limit = 0
indexed_kv_limit = 0
maxcols = 512
maxchars = 102400

Solved Indexer(s) side 👍
Thanks for the trick 👍

isoutamo · ‎11-05-2021

Good to hear that it works now!

As you increased those values, it’s meaning more resource usage like memory, you should follow up if there are any weird behavior. Basically it shouldn’t do anything special as those limits are still quite reasonable.

Roy_9 · ‎11-05-2021

I have the similar kind of issue where we are ingesting the logs from mulesoft cloud to Splunk cloud via HEC.there are few Json payloads which are so heavy close to 2 million bytes.we have set the truncate limit to 4,50,000 bytes instead of 0 since splunk said it is not recommendable to keep it to 0.

Since these heavy payloads are nested json, we are seeing line breaking issues as well along with truncation of event, Is this something can be fixed by changing any settings?

Any help on this would be highly appreciated. @wu_weidong @isoutamo

wu_weidong · ‎11-07-2021

@Roy_9 My JSON records are all flattened to a single line, e.g. {"name": "John.Smith", "phone": "1234567"}, and I have a "LINE_BREAKER = ([\r\n]+)" in my props.conf under the stanza for my sourcetype. Not sure if that helps.

Why are not all field values are extracted for long JSON files?

field extraction

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

Why are not all field values are extracted for long JSON files?

field extraction

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases