Splunk Search

Join type and extra data?

hollybross1219
Path Finder

Here is my query (time range is YTD):

(splunk_server=indexer* index=wsi_tax_summary sourcetype=stash capability=109* tax_year=2019 ein=* intuit_offeringid=* 
partnerId!=*test* partnerId=*) 
| timechart span=1d dc(intuit_tid) as 19attempts 
| streamstats sum(19attempts) as 19attempts
| eval time=strftime(_time,"%m-%d") 
| join type=left time 
   [ inputlookup TY18_Splunk_total_data.csv 
    | where capability="109X" 
    | stats sum(attempts) as 18attempts by _time 
    | streamstats sum(18attempts) as 18attempts
    | eval time=strftime(strptime(_time,"%m/%d/%Y"), "%m-%d")
    | fields time 18attempts]
| fields time 19attempts 18attempts
| rename 19attempts as "TY19"
| rename 18attempts as "TY18"

I understand a left join to mean that if the results from my subsearch don't match with the main search, it won't be included. If I run the query above, I get data in TY18 column from 01-02 thru 01-09 (below).

alt text

I didn't expect data against those dates, so I copied the subsearch and ran it in a separate search window, and I can see (as I expected) there's no data from 01-02 thru 01-09 (below).
alt text

Am I not understanding something about join type? What's happening here?

0 Karma

to4kawa
Ultra Champion

different year's %m-%d values is populated. maybe 2020/01/02 to 2020/01/09

0 Karma

hollybross1219
Path Finder

So I think it might be a problem with the data in the lookup, but I don't understand what's wrong. When I do this basic query:

| inputlookup TY18_Splunk_total_data.csv 
| where capability="109X"

I can see data against dates 1/2/2019 - 1/9/2019
alt text

Aggregating this data, I lose the data against these dates for some reason, and I don't understand why. I ran this query which generated these results. I'm at a loss...

| inputlookup TY18_Splunk_total_data.csv 
| where capability="109X"
| stats sum(attempts) by _time

alt text

0 Karma

to4kawa
Ultra Champion

| inputlookup TY18_Splunk_total_data.csv
| where capability="109X"
| stats sum(attempts) by _time
| sort 0 _time

try and check the result.

and when you usestreamstats , you should sort it.
Splunk fields is ascii order(1,10,2,20 ... ) and result order same at sometime.

0 Karma

kmugglet
Communicator

Hi,

You should be able to get the same results without using join

(splunk_server=indexer* index=wsi_tax_summary sourcetype=stash capability=109* tax_year=2019 ein=* intuit_offeringid=* 
    partnerId!=*test* partnerId=*) 
| timechart span=1d dc(intuit_tid) as 19attempts 
| streamstats sum(19attempts) as 19attempts 
| eval time=strftime(_time,"%m-%d") 
| append 
    [ inputlookup TY18_Splunk_total_data.csv 
    | where capability="109X" 
    | stats sum(attempts) as 18attempts by _time 
    | streamstats sum(18attempts) as 18attempts 
    | eval time=strftime(strptime(_time,"%m/%d/%Y"), "%m-%d") 
    | fields time 18attempts] 
| stats values(19attempts) AS TY19, values(18attempts) AS TY18 by time 

Cheers, Keith

0 Karma

hollybross1219
Path Finder

Hi @kmugglet, thanks for the suggestion. I tried this and it didn't work :(. Join is what I'm looking for because I'm comparing days (represented by the row) across different years (represented by the two columns).

0 Karma

kmugglet
Communicator

Also you should be able to replace this line

[ inputlookup TY18_Splunk_total_data.csv WHERE capability=="109X" 

Sometimes it works, sometimes it doesn't - Yay Splunk

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...