As for smart clustering, you can always write a Python custom search command that does exactly what you need. Look at etc/apps/search/bin/pyrangemap.py for an outdated but easy to understand example.
As for your regex-based bucketing, you can do that natively roughly like this (pseudosplunk):
your search | eval mybucket = case(match(myfield, "myexpression1"), "mybucket1", match(myfield, "myexpression2"), "mybucket2", etc.) | (event)stats count by mybucket
If you use stats you'll get just the count by mybucket as the result, if you use eventstats you'll get the count field added to each search result according to its value of mybucket.
... View more