Splunk Search

Why is there a Regex Character Limitation

jadengoho
Builder

I have a very long regex query (12,000) character long- it consist o different hostname and IP Address combinations.

Now when i run the regex it shows :: Regex: regular expression is too large.

 

error.png

As per checking the Regex can only accommodate - 8190 character.

In the image you can see i use "a" letter 8190 time. but if i add another letter it will show the error.

search.png

 Can somebody explain to me why is this happening and how can i execute my regex properly.

 
 
 

 

 

 

Labels (1)
Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

For reasons known only to those who wrote the code, Splunk can't handle a regular expression longer than 8190 characters.  The workaround is to make the regex short enough to fit into 8190 characters.  Sometimes a single rex command can be split into multiple smaller rex commands.

---
If this reply helps you, Karma would be appreciated.

jadengoho
Builder

Hi @richgalloway 

We tried to shorten the regex from 14,000 to 11,000 characters.

Is there any limits configuration we can tweak to override this Regex limitation

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Usually that kind of tweaks can do with parameters in limits.conf, but at least I cannot found any suitable for that.
@cpetterborg have you any idea for this?

In curiosity how you can manage that regex? Usually much much shorter are already hard to update etc.
0 Karma

isoutamo
SplunkTrust
SplunkTrust
0 Karma

jadengoho
Builder

 

We have a 20,000+ combination of word/phrase that should be present on the logs to be routed to proper index.

Example"

 

CAT should have DOG - routed to sample1 index
RAT should have COUNT - routed to sample1 index.

In the transforms.conf 
REGEX = (cat.*dog|rat.*count|computer.*calculator|computer.*device.*v2)

https://goolge/sites/cat/page/dog
https://goolge/sites/rat/page/count
https://goolge/sites/computer/page/calculator
https://goolge/sites/computer/page/device/machine/v2

 

I've done all the possibilities to compress the regex but that is the best i can do. 

 

0 Karma

mtulett_splunk
Splunk Employee
Splunk Employee

In case this was never resolved, or for others who are interested, the solution here is to use multiple transforms stanzas to bring the total size under 8190, like so:

props.conf:

[my_sourcetype]
TRANSFORMS-index_routing = ruleset1, ruleset2

transforms.conf:

[ruleset1]
REGEX = (cat.*dog|rat.*count)
FORMAT = sample1
DEST_KEY = _MetaData:Index

[ruleset2]
REGEX = (computer.*calculator|computer.*device.*v2)
FORMAT = sample1
DEST_KEY = _MetaData:Index

 I would also argue in this specific case a different approach should be used as a regex this sizable will cause high CPU overhead during ingestion, especially if the source is high-volume.

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...