Splunk Search

Large JSON payloads don't always perform field extractions

tchamp
Explorer

I have some rather large json data payloads being sent over to Splunk. I've seen payloads around 1MB in size. It took me a while to get field extraction to work most of the time.  The main  thing was to create a new source type which mimics the _json (or the json_no_timestamp one) and set TRUNCATE = 0 (which might not be the best thing). Field extraction has been working quite well.

I then duplicated that source type and setup a couple regex to transform some of the data. Field extraction stopped working with the new source type (only with the large payloads). I switched back to the original source type and field extraction works again. I'll note that the data being sent in this test is not being transformed by the regex since the fields don't exist in this particualar set of test data. If I send a smaller payload, field extraction does work properly (even when data is actually transformed/regex).

Can anyone suggest something that I could look at or explain why including regex in the source type that doesn't do any transform of data might stop field extraction from working?

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @tchamp 

There are a few of limits configurations which may be affecting this, these are in limits.conf

The first is [kv]/maxchars - see  https://docs.splunk.com/Documentation/Splunk/latest/Admin/limitsconf#:~:text=time.%0A*%20Default%3A%...

This limits the automatic key/value extraction when searching which by default is 10,240.

The other is [rex]/depth_limit - see https://docs.splunk.com/Documentation/Splunk/latest/Admin/limitsconf#:~:text=pattern.%0A*%20Default%... however I am not sure if this applies to REGEX in transforms or only the rex SPL command, someone else may be able to confirm.

Finally there is [spath]/extraction_cutoff - see https://docs.splunk.com/Documentation/Splunk/latest/Admin/limitsconf#:~:text=extracted.%0A*%20Defaul... - this setting applies extraction only to the first 5000 characters, but again may only be specific to spath,.

Finally, in transforms.conf you have LOOKAHEAD, which defaults to 4096 - see https://docs.splunk.com/Documentation/Splunk/9.4.1/Admin/Transformsconf#:~:text=etc%20as%20above.-,L... which is the maximum distance from the start of your SOURCE_KEY that REGEX will look to start matching your REGEX. You may need to increase this if your expression does not match anything in the first 4096 characters (by default) - I'd probably start by checking this one!

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Have you looked at extraction_cutoff in limits.conf?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

.conf25 Global Broadcast: Don’t Miss a Moment

Hello Splunkers, .conf25 is only a click away.  Not able to make it to .conf25 in person? No worries, you can ...

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...