Splunk Search

Join type and extra data?

hollybross1219
Path Finder

Here is my query (time range is YTD):

(splunk_server=indexer* index=wsi_tax_summary sourcetype=stash capability=109* tax_year=2019 ein=* intuit_offeringid=* 
partnerId!=*test* partnerId=*) 
| timechart span=1d dc(intuit_tid) as 19attempts 
| streamstats sum(19attempts) as 19attempts
| eval time=strftime(_time,"%m-%d") 
| join type=left time 
   [ inputlookup TY18_Splunk_total_data.csv 
    | where capability="109X" 
    | stats sum(attempts) as 18attempts by _time 
    | streamstats sum(18attempts) as 18attempts
    | eval time=strftime(strptime(_time,"%m/%d/%Y"), "%m-%d")
    | fields time 18attempts]
| fields time 19attempts 18attempts
| rename 19attempts as "TY19"
| rename 18attempts as "TY18"

I understand a left join to mean that if the results from my subsearch don't match with the main search, it won't be included. If I run the query above, I get data in TY18 column from 01-02 thru 01-09 (below).

alt text

I didn't expect data against those dates, so I copied the subsearch and ran it in a separate search window, and I can see (as I expected) there's no data from 01-02 thru 01-09 (below).
alt text

Am I not understanding something about join type? What's happening here?

0 Karma

to4kawa
Ultra Champion

different year's %m-%d values is populated. maybe 2020/01/02 to 2020/01/09

0 Karma

hollybross1219
Path Finder

So I think it might be a problem with the data in the lookup, but I don't understand what's wrong. When I do this basic query:

| inputlookup TY18_Splunk_total_data.csv 
| where capability="109X"

I can see data against dates 1/2/2019 - 1/9/2019
alt text

Aggregating this data, I lose the data against these dates for some reason, and I don't understand why. I ran this query which generated these results. I'm at a loss...

| inputlookup TY18_Splunk_total_data.csv 
| where capability="109X"
| stats sum(attempts) by _time

alt text

0 Karma

to4kawa
Ultra Champion

| inputlookup TY18_Splunk_total_data.csv
| where capability="109X"
| stats sum(attempts) by _time
| sort 0 _time

try and check the result.

and when you usestreamstats , you should sort it.
Splunk fields is ascii order(1,10,2,20 ... ) and result order same at sometime.

0 Karma

kmugglet
Communicator

Hi,

You should be able to get the same results without using join

(splunk_server=indexer* index=wsi_tax_summary sourcetype=stash capability=109* tax_year=2019 ein=* intuit_offeringid=* 
    partnerId!=*test* partnerId=*) 
| timechart span=1d dc(intuit_tid) as 19attempts 
| streamstats sum(19attempts) as 19attempts 
| eval time=strftime(_time,"%m-%d") 
| append 
    [ inputlookup TY18_Splunk_total_data.csv 
    | where capability="109X" 
    | stats sum(attempts) as 18attempts by _time 
    | streamstats sum(18attempts) as 18attempts 
    | eval time=strftime(strptime(_time,"%m/%d/%Y"), "%m-%d") 
    | fields time 18attempts] 
| stats values(19attempts) AS TY19, values(18attempts) AS TY18 by time 

Cheers, Keith

0 Karma

hollybross1219
Path Finder

Hi @kmugglet, thanks for the suggestion. I tried this and it didn't work :(. Join is what I'm looking for because I'm comparing days (represented by the row) across different years (represented by the two columns).

0 Karma

kmugglet
Communicator

Also you should be able to replace this line

[ inputlookup TY18_Splunk_total_data.csv WHERE capability=="109X" 

Sometimes it works, sometimes it doesn't - Yay Splunk

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...