Splunk Search

Why is there a Regex Character Limitation

jadengoho
Builder

I have a very long regex query (12,000) character long- it consist o different hostname and IP Address combinations.

Now when i run the regex it shows :: Regex: regular expression is too large.

 

error.png

As per checking the Regex can only accommodate - 8190 character.

In the image you can see i use "a" letter 8190 time. but if i add another letter it will show the error.

search.png

 Can somebody explain to me why is this happening and how can i execute my regex properly.

 
 
 

 

 

 

Labels (1)
Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

For reasons known only to those who wrote the code, Splunk can't handle a regular expression longer than 8190 characters.  The workaround is to make the regex short enough to fit into 8190 characters.  Sometimes a single rex command can be split into multiple smaller rex commands.

---
If this reply helps you, Karma would be appreciated.

jadengoho
Builder

Hi @richgalloway 

We tried to shorten the regex from 14,000 to 11,000 characters.

Is there any limits configuration we can tweak to override this Regex limitation

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Usually that kind of tweaks can do with parameters in limits.conf, but at least I cannot found any suitable for that.
@cpetterborg have you any idea for this?

In curiosity how you can manage that regex? Usually much much shorter are already hard to update etc.
0 Karma

isoutamo
SplunkTrust
SplunkTrust
0 Karma

jadengoho
Builder

 

We have a 20,000+ combination of word/phrase that should be present on the logs to be routed to proper index.

Example"

 

CAT should have DOG - routed to sample1 index
RAT should have COUNT - routed to sample1 index.

In the transforms.conf 
REGEX = (cat.*dog|rat.*count|computer.*calculator|computer.*device.*v2)

https://goolge/sites/cat/page/dog
https://goolge/sites/rat/page/count
https://goolge/sites/computer/page/calculator
https://goolge/sites/computer/page/device/machine/v2

 

I've done all the possibilities to compress the regex but that is the best i can do. 

 

0 Karma

mtulett_splunk
Splunk Employee
Splunk Employee

In case this was never resolved, or for others who are interested, the solution here is to use multiple transforms stanzas to bring the total size under 8190, like so:

props.conf:

[my_sourcetype]
TRANSFORMS-index_routing = ruleset1, ruleset2

transforms.conf:

[ruleset1]
REGEX = (cat.*dog|rat.*count)
FORMAT = sample1
DEST_KEY = _MetaData:Index

[ruleset2]
REGEX = (computer.*calculator|computer.*device.*v2)
FORMAT = sample1
DEST_KEY = _MetaData:Index

 I would also argue in this specific case a different approach should be used as a regex this sizable will cause high CPU overhead during ingestion, especially if the source is high-volume.

Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...