Splunk Search

Why are not all field values are extracted for long JSON files?

wu_weidong
Path Finder

Hi,

I am trying to ingest long JSON files into my Splunk index, where a record could contain more than 10000 characters. To prevent long records from getting truncated, I added a "TRUNCATE=0" into my props.conf, and the entire record was ingested into the index. All events are forwarded and stored in the index, but I'm having problems with fields that appear towards the end of the JSON records. 

I'm currently testing with 2 files:

  • File A has 382 records, of which 166 are long records. 
  • File B has 252 records, of which all are long records. 

All 634 events are returned with a simple search of the index, and I can see all fields in each event, regardless of how long the event is.

However, not all fields are extracted and directly searchable. For example, one of the fields is called "name", and it appears towards the end of each JSON record. On the "Interesting fields" pane, under "name", it shows only a count of 216 events from File A, and none of the remaining 166 + 252 long events in Files A and B. This is the same for other fields that appear towards the end of each JSON record, but fields towards the beginning of the record show all 634 events.

If I negate the 216 events, then these fields do not appear on the Fields pane at all.

Also, while I'm not able to directly search for "name=<name in File B>", I can still select the field from the event and "add to search", and all 252 events would be returned.

I'm not sure why these fields are not properly extracted even though they did not appear to be truncated. How can I extract them properly?

Thank you.

Labels (1)
Tags (2)
0 Karma
1 Solution

wu_weidong
Path Finder

Thanks for the suggestions! While the 3 posts didn't specifically solve my problem, they did lead me to look at the settings in limits.conf (post here ), and I was able to extract all fields from my long JSON records by changing some of the settings.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to

[kv]
limit = 0
indexed_kv_limit = 0 maxcols = 100000 maxchars = 1500000 

Now I'm wondering if there's any issue with setting such high values. 

View solution in original post

cssmdi
Explorer

Hi all
We have a similar problem. We read k8s-logs coming from fluentd and HEC into splunk. There is a message-field in the json, which can be a very long string. Using rex, it is possible to extract the field form json, but without this message and all following fields in _raw stay undefined (isnull(...) is true).

I tested several settings, including /opt/splunk/etc/system/local/limits.conf with the following content:

[realtime]
indexed_realtime_use_by_default = true

[spath]
extract_all = true
#number of characters to read from an XML or JSON event when auto extracting
extraction_cutoff = 50000

[kv]
maxchars = 1500000
limit = 0
indexed_kv_limit = 0
maxcols = 100000

[rex]
match_limit = 500000

Any idea how to solve this?

Thanks

Matthias

wu_weidong
Path Finder

Thanks for the suggestions! While the 3 posts didn't specifically solve my problem, they did lead me to look at the settings in limits.conf (post here ), and I was able to extract all fields from my long JSON records by changing some of the settings.

I modified $SPLUNK_HOME/etc/system/local/limits.conf to

[kv]
limit = 0
indexed_kv_limit = 0 maxcols = 100000 maxchars = 1500000 

Now I'm wondering if there's any issue with setting such high values. 

hettervik
Builder

We also had some inconsistencies with these field extractions. Figured out that we needed to push the new limits configuration to the indexers, as well as the search head. Only pushing to the search head will work if you have a centralizing command before the spath field extraction, but not for streaming field extractions.

0 Karma

verbal_666
Builder

Hi.
limits.conf on Indexers or simple on SearchHead(s)? Or better both?

EDIT: better on Indexers side, since

limit = 

 is for SearchTime from SH to Indexer peer, and 100 is default limit 👍

I also had this "problem" with a ~150 fields JSON, and a simple,

[kv]
limit = 0
indexed_kv_limit = 0 maxcols = 512 maxchars = 102400

Solved Indexer(s) side 👍
Thanks for the trick 👍

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Good to hear that it works now!

As you increased those values, it’s meaning more resource usage like memory, you should follow up if there are any weird behavior. Basically it shouldn’t do anything special as those limits are still quite reasonable.

Roy_9
Motivator

I have the similar kind of issue where we are ingesting the logs from mulesoft cloud to Splunk cloud via HEC.there are few Json payloads which are so heavy close to 2 million bytes.we have set the truncate limit to 4,50,000 bytes instead of 0 since splunk said it is not recommendable to keep it to 0.

Since these heavy payloads are nested json, we are seeing line breaking issues as well along with truncation of event, Is this something can be fixed by changing any settings?

Any help on this would be highly appreciated. @wu_weidong @isoutamo 

0 Karma

wu_weidong
Path Finder

@Roy_9 My JSON records are all flattened to a single line, e.g. {"name": "John.Smith", "phone": "1234567"}, and I have a "LINE_BREAKER = ([\r\n]+)" in my props.conf under the stanza for my sourcetype. Not sure if that helps.

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...