Solved: How to add a column to count for consecutive occur...

phamxuantung · ‎09-06-2022

Hello,

I have a monthly report that produce a table like this

Violation list

Employee	month
A	8-2022
B	8-2022

I want to add a counter to count for consecutive occurrences. For example:

In September report (9-2022)

- employee A violated, his counter increase to 2

- employee B don't violate, don;t show up in report

- employee C violated, first time show up on report, his counter is 1

In October report (10-2022)

- employee A violated again, his counter increase to 3

- employee B violated again, but don't show up on September report, his counter reset to 1.

- employee C violated again, his counter increase to 2

I want the end table to look like this

Employee	month	Counter
A	10-2022	3
B	10-2022	1
C	10-2022	2

Since the report was a outputlookup csv, I don't think I can use streamstat, Can anyone suggest a way to do this.

yuanliu · ‎10-07-2022

It feels such a long time ago I couldn't remember how iteration was done😛 OK, mvexpand saved the day. Also, the previous attempt to survive year end was invalid. That has to be corrected, too.

Again, this is a very literal interpretation of your requirements:

| eval _time = strptime("01-".month, "%d-%m-%Y") ``` Use first day as marker ```
| stats min(_time) as min_month max(_time) as max_month values(_time) as month by Employee
| where max_month == relative_time(now(), "-0mon@mon") ``` only if Employee in current month ```
| eval nmonth = strftime(now(), "%m"), nyear = strftime(now(), "%Y")
| eval nmin = strftime(min_month, "%m"), nmax = strftime(max_month, "%m")
| eval mrange = mvrange(0, nmax - nmin + 1) ``` previous definition will not work across year ```
| mvexpand mrange
| eval rmonth = relative_time(min_month, "+".tostring(mrange)."mon") ``` this survives year boundary ```
| eval outmonth = if(isnull(mvfind(strftime(month, "%m"), strftime(rmonth, "%m"))), rmonth, null()) ``` when Employee has no record ```
| stats max(outmonth) as maxout by Employee month max_month ``` last month when Employee has no record ```
| eval maxout = if(isnull(maxout), 0, maxout)
| where month > maxout
| rename max_month as _time
| stats count by Employee _time

Employee	_time	count
A	2022-10	4
B	2022-10	2
D	2022-10	1

Here is the data emulation for the test.

| makeresults
| eval _raw="Employee,month
A,7-2022
D,7-2022
A,8-2022
C,8-2022
D,8-2022
A,9-2022
B,9-2022
C,9-2022
A,10-2022
B,10-2022
D,10-2022"
| multikv forceheader=1

As you can see, D had records in 7, 8, but was missing from 9.

Employee	month
A	7-2022
D	7-2022
A	8-2022
C	8-2022
D	8-2022
A	9-2022
B	9-2022
C	9-2022
A	10-2022
B	10-2022
D	10-2022

View solution in original post

yuanliu · ‎09-07-2022

Assuming the search time is current month, here is a literal implementation of your requirements:

| eval _time = strptime("01-".month, "%d-%m-%Y") ``` Use first day as marker ```
| stats min(_time) as min_month max(_time) as max_month values(_time) as month by Employee
| where max_month == relative_time(now(), "-0mon@mon") ``` only if Employee in current month ```
| eval nmonth = strftime(now(), "%m"), nyear = strftime(now(), "%Y")
| eval nmin = strftime(min_month, "%m"), nmax = strftime(max_month, "%m")
| eval mrange = mvrange(nmin, nmax + 1)
| eval nmonths = mvmap(mrange, if(mrange > nmonth, strptime(nyear - 1 . "-" . mrange . "-01", "%Y-%m-%d"), strptime(nyear . "-" . mrange . "-01", "%Y-%m-%d")))
```    ^ calculate calendar months in range ```
| eval outmonth = max(mvmap(nmonths, mvappend(outmonth, if(IN(nmonths, month), null(), nmonths)))) ``` last month when Employee had no record ```
| eval month = mvmap(month, mvappend(lastmonths, if(month < outmonth, null(), month))) ``` only count last months when Employee has record ```
| eval count = mvcount(month)
| rename max_month as _time
| table Employee _time count

Using modified sample data, the output is

Employee	_time	count
A	2022-09	3
B	2022-09	1
C	2022-09	2

(Because current month is 09, I moved all data back one month.)

Below is simulated data used in the test. To make things more interesting, I added an employee D that only occurred in the first two months but not the current month's report.

| makeresults
| eval _raw="Employee,month
A,7-2022
B,7-2022
D,7-2022
A,8-2022
C,8-2022
D,8-2022
A,9-2022
B,9-2022
C,9-2022"
| multikv forceheader=1

Employee	month
A	7-2022
B	7-2022
D	7-2022
A	8-2022
C	8-2022
D	8-2022
A	9-2022
B	9-2022
C	9-2022

yuanliu · ‎10-07-2022

For the record, the original code (for Splunk 8 and later) will not survive year end despite a convoluted attempt. Here is the corrected code using relative_time with mvrange.

| eval _time = strptime("01-".month, "%d-%m-%Y") ``` Use first day as marker ```
| stats min(_time) as min_month max(_time) as max_month values(_time) as month by Employee
| where max_month == relative_time(now(), "-0mon@mon") ``` only if Employee in current month ```
| eval nmonth = strftime(now(), "%m"), nyear = strftime(now(), "%Y")
| eval nmin = strftime(min_month, "%m"), nmax = strftime(max_month, "%m")
| eval mrange = mvrange(0, nmax - nmin + 1) ``` previous definition will not work across year ```
| eval nmonths = mvmap(mrange, relative_time(min_month, "+".tostring(mrange)."mon")) ``` also much less convoluted ```
```    ^ calculate calendar months in range ```
| eval outmonth = max(mvmap(nmonths, mvappend(outmonth, if(IN(nmonths, month), null(), nmonths)))) ``` last month when Employee had no record ```
| eval month = mvmap(month, mvappend(lastmonths, if(month < outmonth, null(), month))) ``` only count last months when Employee has record ```
| eval count = mvcount(month)
| rename max_month as _time
| table Employee _time count

phamxuantung · ‎10-07-2022

Sorry for the late reply,

My Splunk version is quite old (7.2.6) and it doesn't support mvmap. Do you have another function that could work?

yuanliu · ‎10-07-2022

It feels such a long time ago I couldn't remember how iteration was done😛 OK, mvexpand saved the day. Also, the previous attempt to survive year end was invalid. That has to be corrected, too.

Again, this is a very literal interpretation of your requirements:

| eval _time = strptime("01-".month, "%d-%m-%Y") ``` Use first day as marker ```
| stats min(_time) as min_month max(_time) as max_month values(_time) as month by Employee
| where max_month == relative_time(now(), "-0mon@mon") ``` only if Employee in current month ```
| eval nmonth = strftime(now(), "%m"), nyear = strftime(now(), "%Y")
| eval nmin = strftime(min_month, "%m"), nmax = strftime(max_month, "%m")
| eval mrange = mvrange(0, nmax - nmin + 1) ``` previous definition will not work across year ```
| mvexpand mrange
| eval rmonth = relative_time(min_month, "+".tostring(mrange)."mon") ``` this survives year boundary ```
| eval outmonth = if(isnull(mvfind(strftime(month, "%m"), strftime(rmonth, "%m"))), rmonth, null()) ``` when Employee has no record ```
| stats max(outmonth) as maxout by Employee month max_month ``` last month when Employee has no record ```
| eval maxout = if(isnull(maxout), 0, maxout)
| where month > maxout
| rename max_month as _time
| stats count by Employee _time

Employee	_time	count
A	2022-10	4
B	2022-10	2
D	2022-10	1

Here is the data emulation for the test.

| makeresults
| eval _raw="Employee,month
A,7-2022
D,7-2022
A,8-2022
C,8-2022
D,8-2022
A,9-2022
B,9-2022
C,9-2022
A,10-2022
B,10-2022
D,10-2022"
| multikv forceheader=1

As you can see, D had records in 7, 8, but was missing from 9.

Employee	month
A	7-2022
D	7-2022
A	8-2022
C	8-2022
D	8-2022
A	9-2022
B	9-2022
C	9-2022
A	10-2022
B	10-2022
D	10-2022

yuanliu · ‎10-10-2022

Just want to note that the second effort to survive year-end was still incorrect.☹️ I must use modulus in mrange:

| eval mrange = mvrange(0, (nmax - nmin + 13) % 12) ``` previous two definitions will not work across year ```

Put together,

| eval _time = strptime("01-".month, "%d-%m-%Y") ``` Use first day as marker ```
| stats min(_time) as min_month max(_time) as max_month values(_time) as month by Employee
| where max_month == relative_time(now(), "-0mon@mon") ``` only if Employee in current month ```
| eval nmin = strftime(min_month, "%m"), nmax = strftime(max_month, "%m")
| eval mrange = mvrange(0, (nmax - nmin + 13) % 12) ``` previous 2 definitions will not work across year ```
| mvexpand mrange
| eval rmonth = relative_time(min_month, "+".tostring(mrange)."mon") ``` this survives year boundary ```
| eval outmonth = if(isnull(mvfind(strftime(month, "%m"), strftime(rmonth, "%m"))), rmonth, null()) ``` when Employee has no record ```
| stats max(outmonth) as maxout by Employee month max_month ``` last month when Employee has no record ```
| eval maxout = if(isnull(maxout), 0, maxout)
| where month > maxout
| rename max_month as _time
| stats count by Employee _time

Or, if anyone prefers mvmap (Splunk 8 or later) over mvexpand,

| eval _time = strptime("01-".month, "%d-%m-%Y") ``` Use first day as marker ```
| stats min(_time) as min_month max(_time) as max_month values(_time) as month by Employee
| where max_month == relative_time(now(), "-0mon@mon") ``` only if Employee in current month ```
| eval nmin = strftime(min_month, "%m"), nmax = strftime(max_month, "%m")
| eval mrange = mvrange(0, (nmax - nmin + 13) % 12) ``` previous 2 definitions will not work across year ```
| eval nmonths = mvmap(mrange, relative_time(min_month, "+".tostring(mrange)."mon")) ``` also much less convoluted ```
```    ^ calculate calendar months in range ```
| eval outmonth = max(mvmap(nmonths, mvappend(outmonth, if(IN(nmonths, month), null(), nmonths)))) ``` last month when Employee had no record ```
| eval month = mvmap(month, mvappend(lastmonths, if(month < outmonth, null(), month))) ``` only count last months when Employee has record ```
| eval count = mvcount(month)
| rename max_month as _time
| table Employee _time count

phamxuantung · ‎10-09-2022

Thank you, this is working as intended and fit what I need.

gcusello · ‎09-06-2022

Hi @phamxuantung,

did you explored the "delta" command(https://www.splunk.com/en_us/blog/tips-and-tricks/search-commands-delta.html#:~:text=Like%20accum%2C.... or https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchReference/Delta)? probably solves your need.

Could you share your search?

Ciao.

Giuseppe

ITWhisperer · ‎09-06-2022

Assuming you are replacing the contents of the csv every month with outputlookup, you could do something like this

<your report without final outputlookup>
| lookup your.csv Employee
| eval Counter=if(isnull(Counter),1,Counter+1)
| outputlookup your.csv

How to add a column to count for consecutive occurrences?

count

stats

table

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?