Dashboards & Visualizations

Right outer join for realtime search on dashboard?

sc0tt
Builder

I have a real-time search that spans a 5 minute window to count the number of users currently accessing the system. The search is used to load a table module on a live dashboard. If there are no users for a particular app then the app is no longer displayed in the table results. What I need to is to some how create a right outer join for a realtime search.

| stats dc(user_id) as user_count by host app

How can I display a count of 0 if there are no matching events? I've tried using a lookup table and subsearches, but it doesn't seem that this will work in a real-time search.

Any help would be greatly appreciated.

UPDATE 1: Expanding comments that do not fit below.

In the end, I modified my solution a bit because I needed an additional BY field. Using your guide I came up with the following which seems to work.

| eval expander =if(count==1,split(app+":"+host+",app1:host1,app2:host2,app3:host3",","),null())
| mvexpand expander
| eval expander_app = mvindex(split(expander, ":"),0)
| eval expander_host = mvindex(split(expander, ":"),-1)
| rename expander_app as app expander_host as host
| stats dc(user_id) as user_count by app host

Many thanks again!

UPDATE 2: It doesn't seem that this will work in a real-time search. When performing a historical search using your below example, I will receive counts. If I change it to a real-time search I get a count of 1 for all hosts. By the way, I'm using Splunk 6.

index=_internal
| streamstats count
| eval expander=if(count==1,split(host+",host1,host2,host3",","),null())
| mvexpand expander
| streamstats count
| eval user_id=if((count>1 AND count<5),null(),user_id)
| rename expander as host
| stats c as user_count by host

UPDATE 3: Working solution It appears that the issue is with the mvexpand. Null values will not expand in a real-time search. Keeping with the same example, I needed to modify the expander line and to create a new host field using eval and not simply renaming. This now seems to work as expected

index=_internal
| streamstats count
| eval expander=if(count==1,split(host+",host1,host2,host3",","),"")
| mvexpand expander
| streamstats count
| eval user_id=if((count>1 AND count<5),null(),user_id)
| eval host=if((count>1 AND count<5),expander,host)
| stats dc(user_id) as user_count c(host) as host_count by host

Tags (3)
1 Solution

sideview
SplunkTrust
SplunkTrust

You wont be able to do a join in a realtime search, or an append either.

In normal historical searches the easiest way is to use the append command and glue on an extra little few hosts so you know they'll always be there. for example: foo | append [| stats count | fields - count | eval host=split("host1,host2,host3",",") | mvexpand host

However that can't work here - in a realtime search you unfortunately can't use append (or join).

If your hosts were arrayed across the top, ie if each host name was it's own column int he table, then you could do | table * host1 host2 host3 | fillnull host1 host2 host3 value="0" at the end and get it done.

And you might be able to explore the same sort of idea, but using xyseries + table + fillnull + untable, or even just transpose + table + fillnull + transpose.

But what fun is that. Here's a search that uses several obscure tricks, is deranged, and will make for fun reading.

We basically hijack one of the real events in the set and we abuse its own machinery so as to clone new events out of that one event. It'll be fun. We will "append without using append".

It's going to be crazy though so lets go slow and walk through all the steps. First though let's say your base search is just index=_internal, and lets say that you always want "host1", "host2", and "host3" to appear in the table even if the user count is zero

Here's the whole search:

index=_internal 
| streamstats count 
| eval expander=if(count==1,split(host+",host1,host2,host3",","),null()) 
| mvexpand expander 
| streamstats count 
| eval user_id=if((count>1 AND count<5),null(),user_id)
| rename expander as host 
| stats dc(user_id) as user_count by host 

Now let's break it down piece by piece.

index=_internal is of course just a sample. Your search terms will be different.

Next is | eval expander=if(count==1,split(host+",host1,host2,host3",","),null()), this eval command creates a field called expander on just the row that streamstats has numbered with a "1"). The eval command creates a new field called expander on that single row. We glue together some strings as the value of this field, but then we use the split() function so that the expander field on that row begins life as a multivalue field with four values. The first value is the actual host on the original event. The other three values are our "host1", "host2", "host3" - the hosts that we want to always get represented in the chart even if absent from the raw data.

Now we have | mvexpand expander. Since the expander field is only defined on that one row, the only effect on the whole set of search results is to turn that single row into 4 rows. Since one of our multivalue values was actually the original host, mvexpand will have just recreated the cloned row verbatim, and the net effect here is that we've magically added three rows, that have expander values of "host1", "host2", and "host3" respectively. Hey we created a bunch of new rows and we didn't have to use append!

Still some more hard parts though. We need to actually whack away any user_id fields that might be in our newly cloned rows. We have no idea what the row we cloned was and it might have a user_id which would add one to all our 3 user_counts at the end. To do this we need those priceless little index numbers from streamstats count but since the ones we used before will all say "count=1" for all four of our rows, we need another clean streamstats count.

OK. Next up is | eval user_id=if((count>1 AND count<5),null(),user_id). Here for just the rows where count is 2, 3, or 4, we null out the user_id field.

Now we just | rename expander as host, which normally would be dangerous since it clobbers the host field. However a) expander is only present on these four rows so it only has effect there, and b) one of the four rows is that identical copy of the original uncloned row, and we ensured that it's host value actually is the original value. So we clobber the host value with itself.

last but not least, give our augmented search result rows to the stats command - | stats dc(user_id) as user_count by host will happily give rows for "host1", "host2", and "host3", although of course the lack of user_id will make the dc(user_id) values zeros for those rows.

When you get used to this kind of nonsense, it's not particularly difficult. I will absolutely grant that this search and the entire approach behind is CRAZY. But it's the only way I could think of to "append without append".

View solution in original post

sideview
SplunkTrust
SplunkTrust

You wont be able to do a join in a realtime search, or an append either.

In normal historical searches the easiest way is to use the append command and glue on an extra little few hosts so you know they'll always be there. for example: foo | append [| stats count | fields - count | eval host=split("host1,host2,host3",",") | mvexpand host

However that can't work here - in a realtime search you unfortunately can't use append (or join).

If your hosts were arrayed across the top, ie if each host name was it's own column int he table, then you could do | table * host1 host2 host3 | fillnull host1 host2 host3 value="0" at the end and get it done.

And you might be able to explore the same sort of idea, but using xyseries + table + fillnull + untable, or even just transpose + table + fillnull + transpose.

But what fun is that. Here's a search that uses several obscure tricks, is deranged, and will make for fun reading.

We basically hijack one of the real events in the set and we abuse its own machinery so as to clone new events out of that one event. It'll be fun. We will "append without using append".

It's going to be crazy though so lets go slow and walk through all the steps. First though let's say your base search is just index=_internal, and lets say that you always want "host1", "host2", and "host3" to appear in the table even if the user count is zero

Here's the whole search:

index=_internal 
| streamstats count 
| eval expander=if(count==1,split(host+",host1,host2,host3",","),null()) 
| mvexpand expander 
| streamstats count 
| eval user_id=if((count>1 AND count<5),null(),user_id)
| rename expander as host 
| stats dc(user_id) as user_count by host 

Now let's break it down piece by piece.

index=_internal is of course just a sample. Your search terms will be different.

Next is | eval expander=if(count==1,split(host+",host1,host2,host3",","),null()), this eval command creates a field called expander on just the row that streamstats has numbered with a "1"). The eval command creates a new field called expander on that single row. We glue together some strings as the value of this field, but then we use the split() function so that the expander field on that row begins life as a multivalue field with four values. The first value is the actual host on the original event. The other three values are our "host1", "host2", "host3" - the hosts that we want to always get represented in the chart even if absent from the raw data.

Now we have | mvexpand expander. Since the expander field is only defined on that one row, the only effect on the whole set of search results is to turn that single row into 4 rows. Since one of our multivalue values was actually the original host, mvexpand will have just recreated the cloned row verbatim, and the net effect here is that we've magically added three rows, that have expander values of "host1", "host2", and "host3" respectively. Hey we created a bunch of new rows and we didn't have to use append!

Still some more hard parts though. We need to actually whack away any user_id fields that might be in our newly cloned rows. We have no idea what the row we cloned was and it might have a user_id which would add one to all our 3 user_counts at the end. To do this we need those priceless little index numbers from streamstats count but since the ones we used before will all say "count=1" for all four of our rows, we need another clean streamstats count.

OK. Next up is | eval user_id=if((count>1 AND count<5),null(),user_id). Here for just the rows where count is 2, 3, or 4, we null out the user_id field.

Now we just | rename expander as host, which normally would be dangerous since it clobbers the host field. However a) expander is only present on these four rows so it only has effect there, and b) one of the four rows is that identical copy of the original uncloned row, and we ensured that it's host value actually is the original value. So we clobber the host value with itself.

last but not least, give our augmented search result rows to the stats command - | stats dc(user_id) as user_count by host will happily give rows for "host1", "host2", and "host3", although of course the lack of user_id will make the dc(user_id) values zeros for those rows.

When you get used to this kind of nonsense, it's not particularly difficult. I will absolutely grant that this search and the entire approach behind is CRAZY. But it's the only way I could think of to "append without append".

sc0tt
Builder

Thank you for providing such a creative solution with a detailed explanation! I was able to follow along and get it working exactly as you described. You are correct, this is a CRAZY approach--but it works!

My only additional question is that does this have any impact on performance with a real-time search?

I updated my original question to include additional comments that would not fit here.

Update: I may have spoken too soon. It doesn't seem that this will work with real-time search. When I run it in a realtime search it's only giving me a count of 1. Any ideas?

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...