Re: Fillnull in outer join

pjtbasu · ‎01-16-2020

Hi,

I require a table containing count of specific service compared between 2 time ranges.

table 1 (time - now)
servicename | count
aaa 2
bbb 3
ccc 4

table 2 (time - previous time with timerange)
servicename | count
bbb 2
ddd 2
ccc 4

After search expectation -

servicename | countnow| oldcount | delta
aaa 2 0 2
bbb 3 2 1
ccc 4 4 0 (added just for understanding, not expected in the actual output, as I've added delta!=0 in my query at the end)
ddd 0 2 -2

Current output with my query (Outer Join works fine but I can't see the difference value, which is the actual requirement) -
aaa 2 (null value is coming, and delta value is coming as null as well)
bbb 3 2 1
ccc 4 4 0 (Coming up in search, even though it should not!)
ddd 2 0

My query is like this -
index=* myquery earliest=-15m latest=now |stats count as CurrentCount by servicename|join type=outer servicename [search index=* myquery earliest=1578492000 latest=1578492900|stats count as OldCount by servicename]
|eval DiffValue= CurrentCount - OldCount |table servicename,OldCount,CurrentCount,DiffValue|search Diffvalue!=0

Don't want to change whole lot in the query as the table output is 1000 rows and outer join works fine. Could you please help around difference calculation part and fillnull part?

woodcock · ‎01-19-2020

NEVER USE join. Try this:

index="*" AND "myquery" (earliest=-15m latest=now OR earliest=1578492000 latest=1578492000)
| stats count AS debugCount count(eval(_time >= relative_time(now(), "-15m") AND time<=now())) AS CurrentCount, count(eval(_time >= 1578492000 AND time<=1578492000)) AS OldCount BY servicename
| eval DiffValue= CurrentCount - OldCount 
| where Diffvalue!=0
| table servicename, OldCount, CurrentCount, DiffValue

niketn · ‎01-17-2020

@acfecondo75 trust me on Splunk Answers every community contributor's prime focus is the support for the poster. We do tend to give hints/ performance optimization tips rather than spoon feeding solutions. If you notice closely @kamlesh_vaghela's approach is different than yours. Also Kamlesh has posted his answer as a comment for @pjtbasu to try and confirm. Which means his comment can not be accepted as Answer, but your's can. However, the best solution approach is by @to4kawa which tried to avoid subsearch completely and would perform better as well with map-reduce commands.

Unlike other online forums, Splunk Answers thrives on positive sentiments hence down-voting is rarely used by community members unless the suggested solution is detrimental/irreversible for Splunk stack. Refer to the Voting etiquette docs for Splunk Answers Community participation.

Having said all these, let me up-vote both of your comments to get you on same page to keep assisting folks in need. Without our active participation and positive mindset the community will not thrive... so keep Splunking and Happy Weekend.

There has to be upvote for @to4kawa 's answer once he fixes outer join approach as that is the best way to perform such kind of correlation. While there is no silver bullet for correlation, this is a good Splunk Documentation for Event Grouping and correlation: https://docs.splunk.com/Documentation/Splunk/latest/Search/Abouteventcorrelation

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

acfecondo75 · ‎01-17-2020

after reading your response, I agree that i was hasty to downvote. I switched to an upvote and removed the comment.

to4kawa · ‎01-16-2020

index=myquery (earliest=-15m latest=now) OR (earliest=1578492000 latest=1578492900)
| eval date_label=case(_time >= relative_time(_time,"-15m") , "current" ,_time <= 1578492900, "old")
| stats count(eval(date_label="current")) as CurrentCount count(eval(date_label="old")) as OldCount by servicename 
| eval DiffValue= CurrentCount - OldCount 
| where DiffValue!="0"
| table servicename,OldCount,CurrentCount,DiffValue

Hi, @pjtbasu
your request:

Don't want to change whole lot in the query

I see, but your query 's join is too slow and need fillnull .
How about my recommend?
The current and past logs are searched once and tabulated for each label.
It's fast.

to4kawa · ‎01-17-2020

Main search:

index= myquery earliest=-15m latest=now 
|stats count as CurrentCount by servicename

Result:

servicename,count
service_A,20
service_B,21
service_C,19

Sub search result:

 servicename,count
 service_A,17
 service_C,30

I think the reason @pjtbasu used join type=outer is to handle the case where the sub-search results in a different service name than the main search, as in the case above.

Before giving this answer, I placed about 10 service names on a trial basis in two days and counted them for each 15 minutes with a difference of 12 hours.

As a result, depending on the service name, I have confirmed that 0 was properly displayed.
sorry. I could not retry. I forget the query. but

| makeresults 
| eval _raw="No,value,service
1,1,A
2, ,B
3,1,C"
| multikv forceheader=1
| stats count(eval(value=1)) as count by service

This result outputs "0", maybe my answer is OK.
But how about it actually? I would like @pjtbasu to check that.

niketn · ‎01-19-2020

In Simple words join type outer when query before join command runs for last 15min and query after join runs for specific timestamp, all the services from last 15 min must be present in the result.

I looked at the code and the question and does not seem like the two are in sync 🙂 Seems like the requirement is for full outer join not just outer join. Full outer join is not available through the join command. Your solution will still work however needs to be tweaked as the requirement is not to filter out results where counts are same in current and past, but to have 0 values when the service is not available. So | where DiffValue!=0 does not seem to be required. Rest all is fine and actually optimal.

PS: I was confused between the requirement of outer join (which in Splunk is left join) and full outer join (which seems to be requirement and handled in your code)

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

to4kawa · ‎01-19-2020

full outer join , I haven't know till now. I see.
Thanks @niketnilay

niketn · ‎01-17-2020

@to4kawa, this is the right approach to pull required data from index and use stats for correlation. However, original query had left/outer join which means you were supposed to retain all the values from first time range which means you would need to do either values(date_label) as uniqueLables or dc(date_label) as uniqueLables and filter all values with uniqueLables for current and non-zero values from old. Which means finally | fillnull CurrentCount will be required even in your query! 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

kamlesh_vaghela · ‎01-16-2020

@pjtbasu

I have made some changes in your search. Can you please try this?

index=myquery earliest=-15m latest=now 
| stats count as CurrentCount by servicename 
| append 
    [ search index=myquery earliest=1578492000 latest=1578492900 
    | stats count as OldCount by servicename] 
| stats values(CurrentCount) as CurrentCount values(OldCount) as OldCount by servicename
| fillnull value=0
| eval DiffValue= CurrentCount - OldCount 
| where DiffValue!="0"
| table servicename,OldCount,CurrentCount,DiffValue

pjtbasu · ‎01-19-2020

@kamlesh_vagela Your search did the job! Thank you!

Thanks @niketnilay & @acfecondo75 for your inputs and end of the day, the knowledge and willingness to help, of this community is remarkable. You guys rock!!

niketn · ‎01-19-2020

@pjtbasu, as stated earlier did you get to try @to4kawa 's approach of avoiding subsearch? That would be the best solution for your use case.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

kamlesh_vaghela · ‎01-19-2020

Yes @pjtbasu 🙂 you are right.. please accept and close it.

acfecondo75 · ‎01-16-2020

Hi @pjtbasu,

In the query you provided, your subsearch earliest and latest time are exactly the same value. Not sure if this was copy paste error here or if that's how it was when you ran the search. But if it's the latter, that's most likely why you're not getting a count from your join. As far as fixing the null issue, just add the following line after the subsearch but before the eval for DiffValue:

| fillnull value=0 OldCount

Then the reason the 0s are still showing is because you have a typo in the last line of your search. You wrote Diffvalue!=0 instead of DiffValue!=0 (capitalize the V so the field names match).

Fillnull in outer join

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation

Fillnull in outer join

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability