Solved: How do we keep the null results obtained from stat...

tareddy · ‎07-15-2017

I am trying to obtain the DailyTransactions and WeeklyTranscations . The following is my Query ->

index=INDEXA sourcetype=*XYZ* earliest=-7d@d latest=@d "service=randomservice"
| stats first(_time) as _time,  values(A), values(B), values(C), values(D), values(ProductType) AS ProductType, values(E) AS Decision by productId 
| search (ProductType=ProductTypeA OR ProductType=ProductTypeB OR ProductType=ProductTypeC OR ProductType=ProductTypeD) 
| stats values(count) as WeeklyTransactions by ProductType Decision
| appendcols 
    [search index=INDEXA sourcetype=*XYZ* earliest=-1d@d latest=@d "service=randomservice"
    | stats first(_time) as _time,  values(A), values(B), values(C), values(D), values(ProductType) AS ProductType, values(E) AS Decision by productId 
    | search (ProductType=ProductTypeA OR ProductType=ProductTypeB OR ProductType=ProductTypeC OR ProductType=ProductTypeD)
    | stats count as DailyTransactions by ProductType Decision ]

This is the result i am expecting to see ->

ProductType    Decision    WeeklyTransactions   DailyTransactions  
ProductTypeA   Decision1            10                3
ProductTypeA   Decision2                  25                              0 
ProductTypeB   Decision2                  13                              4
ProductTypeC   Decision3                  85                             20

This is the result i'm obtaining on running my query ->

ProductType    Decision    WeeklyTransactions   DailyTransactions  
ProductTypeA   Decision1            10                     3
ProductTypeA   Decision2                   25                              4
ProductTypeB   Decision2                   13                             20
ProductTypeC   Decision3                   85

Richfez · ‎07-16-2017

One could, if one were so disposed to do so, use eventstats to get your total over the whole period, then relative_time() functions to re-filter out all the items older than what you actually want.

So for instance, here's an example on my firewall data.

index=fw earliest=-7d@d latest=@d direction=inbound OR direction=outbound
| eventstats count AS TotalCount 
| eval ButIReallyOnlyWantToday = relative_time(now(), "-1d@d") 
| where _time>ButIReallyOnlyWantToday 
| stats count AS Count, max(TotalCount) AS Total

First line just gets me data going back 7 days.
Line 2 gives me a "TotalCount" that's a count of all the events.
In line 3 I create a new field called ButIReallyOnlyWantToday using relative time, putting that number at the start of yesterday.
The fourth line does the filtering, saying "OK, show me only the events where the _time is within that past day".
The last line then counts those as Count, and takes the largest value of TotalCount as the Total. You could take the average, max, min - it doesn't matter because they're all the same. You just do NOT want to aggregate it with sum() or whatever.

So for your specific problem... There's a lot of extra detail and you might have some logic off, but I think you can modify the above to suit. I'll try below. Keep in mind, you can always (and indeed it's a good idea!) run the search command by command, trying to understand each step and what it does. So start with the first few lines, and add line by line watching how it changes. That is really useful to help catch simple mistakes!

Before I do that, in looking at your search - you do a stats first (line 2) THEN filter out some of those? That may be what you need, but usually I'd filter first, then aggregate (makes the whole pipeline more efficient). Still, I don't know your data so if that's right, that's fine with me. You could carefully compare both ways and see. Also, you seem to take a list of producttypes then count the list items. Maybe that's what you want, maybe not. dc() (distinct count) may be a better choice, but I'm not sure exactly what you are doing in that section so don't pay any attention to me unless I speak sense.

Anyway, so, my try:

search index=INDEXA sourcetype=*XYZ* earliest=-7d@d latest=@d "service=randomservice"
| stats first(_time) as _time,  values(A), values(B), values(C), values(D), values(ProductType) AS ProductType, values(E) AS Decision by productId 
| search (ProductType=ProductTypeA OR ProductType=ProductTypeB OR ProductType=ProductTypeC OR ProductType=ProductTypeD)
| eventstats count as WeeklyTransactions by ProductType Decision
| eval ButIReallyOnlyWantToday = relative_time(now(), "-1d@d") 
| where _time>ButIReallyOnlyWantToday 
| stats count AS DailyTransactions, max(WeeklyTransactions) AS WeeklyTransactions

That may do it. If not, could you try it line by line and post back with the first step that seems to do the wrong thing? That's a complex enough search that it could take a little bit of work to get it just right!

Happy Splunking!
Rich

View solution in original post

Richfez · ‎07-16-2017

One could, if one were so disposed to do so, use eventstats to get your total over the whole period, then relative_time() functions to re-filter out all the items older than what you actually want.

So for instance, here's an example on my firewall data.

index=fw earliest=-7d@d latest=@d direction=inbound OR direction=outbound
| eventstats count AS TotalCount 
| eval ButIReallyOnlyWantToday = relative_time(now(), "-1d@d") 
| where _time>ButIReallyOnlyWantToday 
| stats count AS Count, max(TotalCount) AS Total

First line just gets me data going back 7 days.
Line 2 gives me a "TotalCount" that's a count of all the events.
In line 3 I create a new field called ButIReallyOnlyWantToday using relative time, putting that number at the start of yesterday.
The fourth line does the filtering, saying "OK, show me only the events where the _time is within that past day".
The last line then counts those as Count, and takes the largest value of TotalCount as the Total. You could take the average, max, min - it doesn't matter because they're all the same. You just do NOT want to aggregate it with sum() or whatever.

So for your specific problem... There's a lot of extra detail and you might have some logic off, but I think you can modify the above to suit. I'll try below. Keep in mind, you can always (and indeed it's a good idea!) run the search command by command, trying to understand each step and what it does. So start with the first few lines, and add line by line watching how it changes. That is really useful to help catch simple mistakes!

Before I do that, in looking at your search - you do a stats first (line 2) THEN filter out some of those? That may be what you need, but usually I'd filter first, then aggregate (makes the whole pipeline more efficient). Still, I don't know your data so if that's right, that's fine with me. You could carefully compare both ways and see. Also, you seem to take a list of producttypes then count the list items. Maybe that's what you want, maybe not. dc() (distinct count) may be a better choice, but I'm not sure exactly what you are doing in that section so don't pay any attention to me unless I speak sense.

Anyway, so, my try:

search index=INDEXA sourcetype=*XYZ* earliest=-7d@d latest=@d "service=randomservice"
| stats first(_time) as _time,  values(A), values(B), values(C), values(D), values(ProductType) AS ProductType, values(E) AS Decision by productId 
| search (ProductType=ProductTypeA OR ProductType=ProductTypeB OR ProductType=ProductTypeC OR ProductType=ProductTypeD)
| eventstats count as WeeklyTransactions by ProductType Decision
| eval ButIReallyOnlyWantToday = relative_time(now(), "-1d@d") 
| where _time>ButIReallyOnlyWantToday 
| stats count AS DailyTransactions, max(WeeklyTransactions) AS WeeklyTransactions

That may do it. If not, could you try it line by line and post back with the first step that seems to do the wrong thing? That's a complex enough search that it could take a little bit of work to get it just right!

Happy Splunking!
Rich

tareddy · ‎07-16-2017

Thanks for the detailed response rich7177, really appreciate the time and effort you put into it.

cmerriman · ‎07-16-2017

Appendcols does not match any fields together, it just appends the columns from the searches together. You would need to use |join ProductType Decision type=left

How do we keep the null results obtained from stats count by field to match the table?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life