Getting Data In

How to Drop Events Larger Than 10,000 Bytes Before Indexing?

yashb
Engager

Hi everyone,

I'm working on a use case where I need to drop events that are larger than 10,000 bytes before they get indexed in Splunk.

I know about the TRUNCATE setting in props.conf, which limits how much of an event is indexed, but it doesn't actually prevent or drop the event — it just truncates it. My goal is to completely drop large events to avoid ingesting them at all.

So far, I haven’t found a built-in way to drop events purely based on size using transforms.conf or regex routing. I'm wondering:

  • Is there any supported way to do this natively in Splunk?

  • Can this be done using a Heavy Forwarder or a scripted/modular input?

  • Has anyone solved this with a custom ingestion pipeline or pre-filter logic?

Any guidance or examples would be greatly appreciated!

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Both @livehybrid and @richgalloway 's solutions are OK but the question is what problem are you actually trying to solve. It's relatively unlikely that you have - let's say - 8k or 9k characters long events which are perfectly "ok" and suddenly when the event hits the 10k limit the event is "worthless" for you so you're dropping it. It doesn't make much sense since the hard threshold of the data size doesn't seem to be a reasonable way of differentiating between different types of data. I'd be hard pressed to find a scenario where this actually makes sense instead of checking the data syntactically.

BTW, Splunk operates on characters, not bytes so while TRUNCATE indeed cuts to the "about" given size in bytes, the len() functions returns number of code points (not even characters! It might differ in some scripts using composite characters) instead of bytes.

View solution in original post

livehybrid
SplunkTrust
SplunkTrust

Hi @yashb 

Ive used INGEST_EVAL to achieve this for a customer previously, although as @richgalloway you may be able to achieve this with Ingest Actions too.

Here is the sample props/transforms for INGEST_EVAL

== props.conf ==
[yourSourcetype]
TRANSFORMS-dropBigEvents = dropBigEvents

== transforms.conf ==
[dropBigEvents]
INGEST_EVAL = queue=IF(len(_raw)>=10000,"nullQueue",queue)

You could also achieve this with a regex match, however I think this would be resource intensive, so personally would use the INGEST_EVAL route, but including this for completeness.

[dropBigEvents]
REGEX = ^.{10000,}
DEST_KEY = queue
FORMAT = nullQueue

 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

PickleRick
SplunkTrust
SplunkTrust

Both @livehybrid and @richgalloway 's solutions are OK but the question is what problem are you actually trying to solve. It's relatively unlikely that you have - let's say - 8k or 9k characters long events which are perfectly "ok" and suddenly when the event hits the 10k limit the event is "worthless" for you so you're dropping it. It doesn't make much sense since the hard threshold of the data size doesn't seem to be a reasonable way of differentiating between different types of data. I'd be hard pressed to find a scenario where this actually makes sense instead of checking the data syntactically.

BTW, Splunk operates on characters, not bytes so while TRUNCATE indeed cuts to the "about" given size in bytes, the len() functions returns number of code points (not even characters! It might differ in some scripts using composite characters) instead of bytes.

richgalloway
SplunkTrust
SplunkTrust

You can do that with Ingest Actions in either an intermediate HF or the indexers.

Go to Settings->Ingest Actions and click the New Ruleset button.  Select the sourcetype to filter and then choose "Filter using Eval Expression" from the Add Rule dropdown.  Enter "len(_raw) > 10000" as the Eval Expression and click Apply to see the effect.  When you're happy with the set-up, click Save.

---
If this reply helps you, Karma would be appreciated.
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Painting a Clearer Picture: Creating Cross-Domain Visibility with AI Canvas

    Thursday, June 25, 2026  |  11AM PDT / 2PM EDT  Duration: 1 Hour (Includes live Q&A) Register to ...

Analytics Workspace deprecation

As of Splunk Cloud Platform 10.4.2604 and Splunk Enterprise 10.4, Analytics Workspace is now deprecated. ...

Splunk Developer Day Recap: Building, Publishing, and Growing on the Splunk Platform

Splunk Developer Day brought the Splunk developer community together for a practical look at what it means to ...