Re: Join type and extra data?

hollybross1219 · ‎04-30-2020

Here is my query (time range is YTD):

(splunk_server=indexer* index=wsi_tax_summary sourcetype=stash capability=109* tax_year=2019 ein=* intuit_offeringid=* 
partnerId!=*test* partnerId=*) 
| timechart span=1d dc(intuit_tid) as 19attempts 
| streamstats sum(19attempts) as 19attempts
| eval time=strftime(_time,"%m-%d") 
| join type=left time 
   [ inputlookup TY18_Splunk_total_data.csv 
    | where capability="109X" 
    | stats sum(attempts) as 18attempts by _time 
    | streamstats sum(18attempts) as 18attempts
    | eval time=strftime(strptime(_time,"%m/%d/%Y"), "%m-%d")
    | fields time 18attempts]
| fields time 19attempts 18attempts
| rename 19attempts as "TY19"
| rename 18attempts as "TY18"

I understand a left join to mean that if the results from my subsearch don't match with the main search, it won't be included. If I run the query above, I get data in TY18 column from 01-02 thru 01-09 (below).

alt text

I didn't expect data against those dates, so I copied the subsearch and ran it in a separate search window, and I can see (as I expected) there's no data from 01-02 thru 01-09 (below).
alt text

Am I not understanding something about join type? What's happening here?

to4kawa · ‎05-01-2020

different year's %m-%d values is populated. maybe 2020/01/02 to 2020/01/09

hollybross1219 · ‎05-01-2020

So I think it might be a problem with the data in the lookup, but I don't understand what's wrong. When I do this basic query:

| inputlookup TY18_Splunk_total_data.csv 
| where capability="109X"

I can see data against dates 1/2/2019 - 1/9/2019
alt text

Aggregating this data, I lose the data against these dates for some reason, and I don't understand why. I ran this query which generated these results. I'm at a loss...

| inputlookup TY18_Splunk_total_data.csv 
| where capability="109X"
| stats sum(attempts) by _time

alt text

to4kawa · ‎05-01-2020

| inputlookup TY18_Splunk_total_data.csv
| where capability="109X"
| stats sum(attempts) by _time
| sort 0 _time

try and check the result.

and when you usestreamstats , you should sort it.
Splunk fields is ascii order(1,10,2,20 ... ) and result order same at sometime.

kmugglet · ‎04-30-2020

Hi,

You should be able to get the same results without using join

(splunk_server=indexer* index=wsi_tax_summary sourcetype=stash capability=109* tax_year=2019 ein=* intuit_offeringid=* 
    partnerId!=*test* partnerId=*) 
| timechart span=1d dc(intuit_tid) as 19attempts 
| streamstats sum(19attempts) as 19attempts 
| eval time=strftime(_time,"%m-%d") 
| append 
    [ inputlookup TY18_Splunk_total_data.csv 
    | where capability="109X" 
    | stats sum(attempts) as 18attempts by _time 
    | streamstats sum(18attempts) as 18attempts 
    | eval time=strftime(strptime(_time,"%m/%d/%Y"), "%m-%d") 
    | fields time 18attempts] 
| stats values(19attempts) AS TY19, values(18attempts) AS TY18 by time

Cheers, Keith

hollybross1219 · ‎05-01-2020

Hi @kmugglet, thanks for the suggestion. I tried this and it didn't work :(. Join is what I'm looking for because I'm comparing days (represented by the row) across different years (represented by the two columns).

kmugglet · ‎04-30-2020

Also you should be able to replace this line

[ inputlookup TY18_Splunk_total_data.csv WHERE capability=="109X"

Sometimes it works, sometimes it doesn't - Yay Splunk

Join type and extra data?

Bridging the Gap: Splunk Helps Students Move from Classroom to Career

Preparing your Splunk Environment for OpenSSL3

Unleash Unified Security and Observability with Splunk Cloud Platform