Reporting

Time from subquery lost when joining to lookup table

mcantrell
Explorer

I have a report which I'm trying to enhance to use a lookup report. The existing query is doing a simple agg to count by date and identifier:

type="REST" resource="Order" status="FINISHED" | bucket span=1d _time | stats count by client_id, _time


_time client_id count
6/7/12 12:00:00.000 AM 20006 123
6/8/12 12:00:00.000 AM 20006 117
6/7/12 12:00:00.000 AM 20008 36
6/7/12 12:00:00.000 AM 20009 101

I'm using a lookup table to enhance the data but I also want to see records from the lookup table which had no events. Something like this:


_time client_id Name count
6/7/12 12:00:00.000 AM 20006 Client A 123
6/8/12 12:00:00.000 AM 20006 Client A 117
6/7/12 12:00:00.000 AM 20008 Client B 36
6/8/12 12:00:00.000 AM 20008 Client B 0
6/7/12 12:00:00.000 AM 20009 Client C 101
6/8/12 12:00:00.000 AM 20009 Client C 0
6/7/12 12:00:00.000 AM 20010 Client D 0
6/8/12 12:00:00.000 AM 20010 Client D 0

I thought I'd be able to accomplish this by using inputlookup and left joining to the results from the original query

| inputlookup client_lookup.csv | JOIN type=left client_id [type="REST" resource="Order" status="FINISHED" | bucket span=1d _time | stats count by client_id, _time]

When I do this, the date from the subquery is lost from the output:


client_id Name count
20006 Client A 123
20008 Client B 36
20009 Client C 101
20010 Client D

I obviously have my head stuck in RDBMS land! I'm sure there's probably a better way of doing this or I'm missing something obvious but I'm a little stuck at the moment. Any help would be greatly appreciated.

Tags (2)
0 Karma

sowings
Splunk Employee
Splunk Employee

Assuming a client_lookup.csv like:

client_id,Name
20006,Client A

You could do something like:

type="REST" resource="Order" status="FINISHED" | bucket span=1d _time 
| stats count by client_id, _time
| lookup client_lookup.csv client_id OUTPUT Name

No join needed.

0 Karma

mcantrell
Explorer

OK.. I think I have it now. There is a max option to the join command which defaults to 1. I adjusted it to match the number of days I was searching for and the all of the records are showing up.

0 Karma

mcantrell
Explorer

Instead of:

*sent client_id Name count *
6/7/12 12:00:00.000 AM 20006 Client A 123
6/8/12 12:00:00.000 AM 20006 Client A 117
6/7/12 12:00:00.000 AM 20008 Client B 36
6/8/12 12:00:00.000 AM 20008 Client B 0
6/7/12 12:00:00.000 AM 20009 Client C 101
6/8/12 12:00:00.000 AM 20009 Client C 0
6/7/12 12:00:00.000 AM 20010 Client D

6/8/12 12:00:00.000 AM 20010 Client D

It appears that it probably doesn't work exactly like it would in SQL. Maybe there's a better approach to accomplish the same end.

0 Karma

mcantrell
Explorer

Hmmmm, renaming the field helps a little. The date is available now but something is weird in how it's joining the records. It's finding the first match for each client and then dumping the other dates.

Example:

*sent client_id Name count *
6/7/12 12:00:00.000 AM 20006 Client A 123
6/7/12 12:00:00.000 AM 20008 Client B 36
6/7/12 12:00:00.000 AM 20009 Client C 101
6/7/12 12:00:00.000 AM 20010 Client D

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Join is really not preferred, but if you want to mess with your original query, then rename the _time field to something not beginning with an _, then name it back after the join.

mcantrell
Explorer

The reason for the join is so that I also display the clients which don't have any matching events. Without the left join, they drop off the report.

Ultimately, I want to draw attention to clients which have low or no activity. I realize this is sort of the reverse of what splunk is meant to do but I've never been shy of driving a square peg through a round hole 😉

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...