Splunk Search

Why are not all field values are extracted for long JSON files?

wu_weidong
Path Finder

Hi,

I am trying to ingest long JSON files into my Splunk index, where a record could contain more than 10000 characters. To prevent long records from getting truncated, I added a "TRUNCATE=0" into my props.conf, and the entire record was ingested into the index. All events are forwarded and stored in the index, but I'm having problems with fields that appear towards the end of the JSON records. 

I'm currently testing with 2 files:

  • File A has 382 records, of which 166 are long records. 
  • File B has 252 records, of which all are long records. 

All 634 events are returned with a simple search of the index, and I can see all fields in each event, regardless of how long the event is.

However, not all fields are extracted and directly searchable. For example, one of the fields is called "name", and it appears towards the end of each JSON record. On the "Interesting fields" pane, under "name", it shows only a count of 216 events from File A, and none of the remaining 166 + 252 long events in Files A and B. This is the same for other fields that appear towards the end of each JSON record, but fields towards the beginning of the record show all 634 events.

If I negate the 216 events, then these fields do not appear on the Fields pane at all.

Also, while I'm not able to directly search for "name=<name in File B>", I can still select the field from the event and "add to search", and all 252 events would be returned.

I'm not sure why these fields are not properly extracted even though they did not appear to be truncated. How can I extract them properly?

Thank you.

Labels (1)
Tags (2)
0 Karma
1 Solution

wu_weidong
Path Finder

Thanks for the suggestions! While the 3 posts didn't specifically solve my problem, they did lead me to look at the settings in limits.conf (post here ), and I was able to extract all fields from my long JSON records by changing some of the settings.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to

[kv]
limit = 0
indexed_kv_limit = 0 maxcols = 100000 maxchars = 1500000 

Now I'm wondering if there's any issue with setting such high values. 

View solution in original post

0 Karma

cssmdi
Explorer

Hi all
We have a similar problem. We read k8s-logs coming from fluentd and HEC into splunk. There is a message-field in the json, which can be a very long string. Using rex, it is possible to extract the field form json, but without this message and all following fields in _raw stay undefined (isnull(...) is true).

I tested several settings, including /opt/splunk/etc/system/local/limits.conf with the following content:

[realtime]
indexed_realtime_use_by_default = true

[spath]
extract_all = true
#number of characters to read from an XML or JSON event when auto extracting
extraction_cutoff = 50000

[kv]
maxchars = 1500000
limit = 0
indexed_kv_limit = 0
maxcols = 100000

[rex]
match_limit = 500000

Any idea how to solve this?

Thanks

Matthias

0 Karma

isoutamo
SplunkTrust
SplunkTrust

wu_weidong
Path Finder

Thanks for the suggestions! While the 3 posts didn't specifically solve my problem, they did lead me to look at the settings in limits.conf (post here ), and I was able to extract all fields from my long JSON records by changing some of the settings.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to

[kv]
limit = 0
indexed_kv_limit = 0 maxcols = 100000 maxchars = 1500000 

Now I'm wondering if there's any issue with setting such high values. 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Good to hear that it works now!

As you increased those values, it’s meaning more resource usage like memory, you should follow up if there are any weird behavior. Basically it shouldn’t do anything special as those limits are still quite reasonable.

PA1
Builder

I have the similar kind of issue where we are ingesting the logs from mulesoft cloud to Splunk cloud via HEC.there are few Json payloads which are so heavy close to 2 million bytes.we have set the truncate limit to 4,50,000 bytes instead of 0 since splunk said it is not recommendable to keep it to 0.

Since these heavy payloads are nested json, we are seeing line breaking issues as well along with truncation of event, Is this something can be fixed by changing any settings?

Any help on this would be highly appreciated. @wu_weidong @isoutamo 

0 Karma

wu_weidong
Path Finder

@PA1 My JSON records are all flattened to a single line, e.g. {"name": "John.Smith", "phone": "1234567"}, and I have a "LINE_BREAKER = ([\r\n]+)" in my props.conf under the stanza for my sourcetype. Not sure if that helps.

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...