Splunk Search

Can I define string buckets with regular expressions (regex)?

manus
Communicator

Hello,

Here is the data format:
00:00:01 subject=A.A

00:00:01 subject=B.A

00:00:01 subject=A.A.A

00:00:01 subject=A.B.A

...

I would like to count the events in buckets I would have defined with regular expressions.
For exemple here, I would like to define the following buckets:

A\.A.*
A\.[B-Z].*
B.*
[C-Z].*

and count the event in each bucket.

It looks like rangemap only works with text fields.

Bucketdir doesn't seem to allow to define my buckets with regular expressions.

Second question just in case, is there a smart function which creates clever buckets based on the repartition in the tree defined by the subject string?
By clever, I mean a function which groups a large semantic with few events together (eg: [C-Z].* ), but separate a precise semantic (eg A\.A\.A.*) because it contains more events. So in the end, all buckets are almost equal in size, so it's a very useful visual representation of where the events are in the tree, with some drill down in some parts of the tree.

Just in case, somebody wonders, or for TAG research purpose:
I'm trying to do that to get a good representation of the repartition of TIBCO RV multicast data.

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

As for smart clustering, you can always write a Python custom search command that does exactly what you need. Look at etc/apps/search/bin/pyrangemap.py for an outdated but easy to understand example.

As for your regex-based bucketing, you can do that natively roughly like this (pseudosplunk):

your search | eval mybucket = case(match(myfield, "myexpression1"), "mybucket1", match(myfield, "myexpression2"), "mybucket2", etc.) | (event)stats count by mybucket

If you use stats you'll get just the count by mybucket as the result, if you use eventstats you'll get the count field added to each search result according to its value of mybucket.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

As for smart clustering, you can always write a Python custom search command that does exactly what you need. Look at etc/apps/search/bin/pyrangemap.py for an outdated but easy to understand example.

As for your regex-based bucketing, you can do that natively roughly like this (pseudosplunk):

your search | eval mybucket = case(match(myfield, "myexpression1"), "mybucket1", match(myfield, "myexpression2"), "mybucket2", etc.) | (event)stats count by mybucket

If you use stats you'll get just the count by mybucket as the result, if you use eventstats you'll get the count field added to each search result according to its value of mybucket.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...