<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can I define string buckets with regular expressions (regex)? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Can-I-define-string-buckets-with-regular-expressions-regex/m-p/193980#M55907</link>
    <description>&lt;P&gt;As for smart clustering, you can always write a Python custom search command that does exactly what you need. Look at etc/apps/search/bin/pyrangemap.py for an outdated but easy to understand example.&lt;/P&gt;

&lt;P&gt;As for your regex-based bucketing, you can do that natively roughly like this (pseudosplunk):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;your search | eval mybucket = case(match(myfield, "myexpression1"), "mybucket1", match(myfield, "myexpression2"), "mybucket2", etc.) | (event)stats count by mybucket
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If you use &lt;CODE&gt;stats&lt;/CODE&gt; you'll get just the count by mybucket as the result, if you use &lt;CODE&gt;eventstats&lt;/CODE&gt; you'll get the count field added to each search result according to its value of mybucket.&lt;/P&gt;</description>
    <pubDate>Fri, 21 Mar 2014 15:12:12 GMT</pubDate>
    <dc:creator>martin_mueller</dc:creator>
    <dc:date>2014-03-21T15:12:12Z</dc:date>
    <item>
      <title>Can I define string buckets with regular expressions (regex)?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-I-define-string-buckets-with-regular-expressions-regex/m-p/193979#M55906</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;Here is the data format: &lt;BR /&gt;
00:00:01 subject=A.A&lt;BR /&gt;&lt;BR /&gt;
00:00:01 subject=B.A&lt;BR /&gt;&lt;BR /&gt;
00:00:01 subject=A.A.A&lt;BR /&gt;&lt;BR /&gt;
00:00:01 subject=A.B.A&lt;BR /&gt;&lt;BR /&gt;
...&lt;/P&gt;

&lt;P&gt;I would like to count the events in buckets I would have defined with regular expressions.&lt;BR /&gt;
For exemple here, I would like to define the following buckets:  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;A\.A.*
A\.[B-Z].*
B.*
[C-Z].*
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and count the event in each bucket.&lt;BR /&gt;&lt;BR /&gt;
It looks like rangemap only works with text fields.&lt;BR /&gt;&lt;BR /&gt;
Bucketdir doesn't seem to allow to define my buckets with regular expressions.&lt;/P&gt;

&lt;P&gt;Second question just in case, is there a smart function which creates clever buckets based on the repartition in the tree defined by the subject string?&lt;BR /&gt;
By clever, I mean a function which groups a large semantic with few events together (eg: &lt;CODE&gt;[C-Z].*&lt;/CODE&gt; ), but separate a precise semantic (eg &lt;CODE&gt;A\.A\.A.*&lt;/CODE&gt;) because it contains more events. So in the end, all buckets are almost equal in size, so it's a very useful visual representation of where the events are in the tree, with some drill down in some parts of the tree.&lt;/P&gt;

&lt;P&gt;Just in case, somebody wonders, or for TAG research purpose:&lt;BR /&gt;
I'm trying to do that to get a good representation of the repartition of TIBCO RV multicast data.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Mar 2014 11:56:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-I-define-string-buckets-with-regular-expressions-regex/m-p/193979#M55906</guid>
      <dc:creator>manus</dc:creator>
      <dc:date>2014-03-19T11:56:26Z</dc:date>
    </item>
    <item>
      <title>Re: Can I define string buckets with regular expressions (regex)?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-I-define-string-buckets-with-regular-expressions-regex/m-p/193980#M55907</link>
      <description>&lt;P&gt;As for smart clustering, you can always write a Python custom search command that does exactly what you need. Look at etc/apps/search/bin/pyrangemap.py for an outdated but easy to understand example.&lt;/P&gt;

&lt;P&gt;As for your regex-based bucketing, you can do that natively roughly like this (pseudosplunk):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;your search | eval mybucket = case(match(myfield, "myexpression1"), "mybucket1", match(myfield, "myexpression2"), "mybucket2", etc.) | (event)stats count by mybucket
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If you use &lt;CODE&gt;stats&lt;/CODE&gt; you'll get just the count by mybucket as the result, if you use &lt;CODE&gt;eventstats&lt;/CODE&gt; you'll get the count field added to each search result according to its value of mybucket.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Mar 2014 15:12:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-I-define-string-buckets-with-regular-expressions-regex/m-p/193980#M55907</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-03-21T15:12:12Z</dc:date>
    </item>
  </channel>
</rss>

