I have a very long regex query (12,000) character long- it consist o different hostname and IP Address combinations.
Now when i run the regex it shows :: Regex: regular expression is too large.
As per checking the Regex can only accommodate - 8190 character.
In the image you can see i use "a" letter 8190 time. but if i add another letter it will show the error.
Can somebody explain to me why is this happening and how can i execute my regex properly.
For reasons known only to those who wrote the code, Splunk can't handle a regular expression longer than 8190 characters. The workaround is to make the regex short enough to fit into 8190 characters. Sometimes a single rex command can be split into multiple smaller rex commands.
We tried to shorten the regex from 14,000 to 11,000 characters.
Is there any limits configuration we can tweak to override this Regex limitation
@cpetterborg found this which could help you https://community.splunk.com/t5/Archive/Is-there-a-limit-on-searchable-characters-in-an-event/m-p/35...
We have a 20,000+ combination of word/phrase that should be present on the logs to be routed to proper index.
Example"
CAT should have DOG - routed to sample1 index
RAT should have COUNT - routed to sample1 index.
In the transforms.conf
REGEX = (cat.*dog|rat.*count|computer.*calculator|computer.*device.*v2)
https://goolge/sites/cat/page/dog
https://goolge/sites/rat/page/count
https://goolge/sites/computer/page/calculator
https://goolge/sites/computer/page/device/machine/v2
I've done all the possibilities to compress the regex but that is the best i can do.
In case this was never resolved, or for others who are interested, the solution here is to use multiple transforms stanzas to bring the total size under 8190, like so:
props.conf:
[my_sourcetype]
TRANSFORMS-index_routing = ruleset1, ruleset2
transforms.conf:
[ruleset1]
REGEX = (cat.*dog|rat.*count)
FORMAT = sample1
DEST_KEY = _MetaData:Index
[ruleset2]
REGEX = (computer.*calculator|computer.*device.*v2)
FORMAT = sample1
DEST_KEY = _MetaData:Index
I would also argue in this specific case a different approach should be used as a regex this sizable will cause high CPU overhead during ingestion, especially if the source is high-volume.