Splunk Search
Highlighted

How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

SplunkTrust
SplunkTrust

Hello everyone,

Now, this one bugs me for some time and this question got my attention back to this topic.
How can one compare fields over multiple source types without the use of join, append or any other subsearch?

I know, there some use cases where one has to use either of the above commands. But I don't want to and I don't need to, so what can be done?

cheers, MuS

Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

SplunkTrust
SplunkTrust

join or append are the most intuitive direction to go in, but it's not very efficient and it's pretty cumbersome.

I will show what can be done by using a run everywhere example. This will report on one field user which is available in both sourcetypes and some others group or run_time which is only available in one sourcetype.

The use case here is to show which user, group and runtime we have per datehour.

Let's start with the obvious one: Have you met chart?

index=_internal earliest=-2h@h latest=-0h@h sourcetype=splunkd OR sourcetype=scheduler | chart values(user) AS user values(group) AS group values(run_time) AS run_time by date_hour

The next obvious will be: Have you met stats?

index=_internal earliest=-2h@h latest=-0h@h sourcetype=splunkd OR sourcetype=scheduler | stats values(user) AS user values(group) AS group values(run_time) AS run_time by date_hour

Now move on for some fancy stats stuff?

index=_internal earliest=-2h@h latest=-0h@h sourcetype=splunkd OR sourcetype=scheduler 
| eval hour-{sourcetype}=date_hour 
| stats values(hour-*) AS * values(user) AS user values(group) AS group values(run_time) AS run_time 
| mvexpand splunkd  
| mvexpand scheduler 
| where splunkd=scheduler 
| rename splunkd AS date_hour 
| fields - scheduler

the eval on hour-{sourcetype}=date_hour will create a new field which looks in this case like hour-splunkd and hour-scheduler which will be used later in the where clause....

or another nice run everywhere example:

index=_internal sourcetype="splunkd" OR sourcetype="splunk_web_access" | streamstats count by status, idx, sourcetype | stats values(idx) AS idx, values(status) AS status, values(sourcetype) AS sourcetype | mvexpand status | eval Status = if(match(idx,status), "MATCH", "NO MATCH") | table status, idx, Status

So why all this hustle and not use join instead?

  • Because it is not neccesary to join all search just because you have different sources.
  • Because you can use stats or chart for it.
  • Because it will be faster.
  • Because it is fun to challenge basic Splunk commands and do some tricks with them 🙂

Just for the record, all the above example run on my laptop and Splunk 6 for about 2.5 seconds while the join to get the same result take about 4.5 seconds:

index=_internal sourcetype=splunkd earliest=-2h@h latest=-0h@h |  join date_hour [ search index=_internal sourcetype=scheduler earliest=-2h@h latest=-0h@h ] | stats values(date_hour) AS date_hour values(user) AS user values(group) AS group 

One more thing: I know that in some use cases one is forced to use join or append, but before that - just give stats a chance .....

cheers, MuS

View solution in original post

Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

Legend

This answer is awesome @MuS

Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

SplunkTrust
SplunkTrust

now it is correct 🙂

Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

SplunkTrust
SplunkTrust

HeHe that is funny, is the 'raw' text it says just give stats a chance but in the answer it is append() ?!???

For sure it should also say ...give stats a chance 😉

Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

Path Finder

Agree this post is great. I'm slowly digesting it but I'm having a little trouble with how to extend to a three source scenario where pairs of the data match but not all three at once. For example...

User: Name, City
Activity: Id, Name, Action, Target
Product: Id, Title, Type

I want to select Name, City, Action, Title, Type where User.Name=Activity.Name and Activity.Target=Product.Id

Can this be done with stats by extending one of the examples above?

Ultimately what I'm trying to do is to show information about the Top 10 Products for a specific City and Action.

0 Karma
Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

SplunkTrust
SplunkTrust

You could use eventstats to not-join the City from User to the Name from Activity, then stats by Id to not-join the Activity with the Product.

Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

Communicator

append0 a chance ....
append0?

0 Karma
Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

SplunkTrust
SplunkTrust

I've occasionally found eventstats to be particularly helpful as well.

Highlighted

Re: How to compare fields over multiple sourcetypes without 'join', 'append' or use of subsearches?

Motivator

@MuS -- Where you said, "the eval on hour-{sourcetype}=date_hour will create a new field which looks in this case like hour-splunkd and hour-scheduler which will be used later in the where clause...." I am wondering about that, because in the where clause it says, where splunkd=scheduler so I don't understand that explanation.

Another thing puzzling me is the *values(hour-) AS *** in the stats command. I don't understand at all what that is doing.