Solved: Re: looking for 'faster' way to join to indexes (s...

loganseth · ‎01-09-2023

splunk receives 2 different stream data sets on a single hec (json).

set 1 has call records
set 2 has call status/disposition

so if i want call detail information from set 1 on calls that meet criteria in set 2, i have to join the records.

i used to use 'join' but read several articles about other ways and came across this method which I like, but really feels so slow/heavy

index="myindex" resource="somefilter"
| stats values(*) as * by guid
| search column="terminated"

because we have millions of rows to search from and i'm just looking for a few.

I tried adding my search criteria higher up, like this:

index="myindex" resource="somefilter" column="terminated"
| stats values(*) as * by guid

but then the other columns come back empty (I presume because it filtered them out, so nothing to join).

So looking for another/faster/better way to:
1. get data from set 2 with criteria X
2. bring back matches of that data from set 1.

Always many thanks for the education!

bowesmana · ‎01-09-2023

The subsearch @PickleRick talks about would look something like this

index="myindex" resource="somefilter" [ 
  search index="myindex" resource="somefilter" column="terminated"
  | stats count by guid
  | fields guid ]
| stats values(*) as * by guid

so your subsearch is creating the list of guids that are terminated and then passing that list as guid constraints to the outer search by effectively adding (guid=A OR guid=B OR guid=C...) to the outer search.

Make your subsearch as concise as possible, i.e. use as many filter criteria as you can (sourcetype etc) to only select the call status records you want.

View solution in original post

PickleRick · ‎01-09-2023

Hard to say with such imprecise description 😉

But if you're looking for just a few guids, it might be one of those cases when subsearch makes sense.

Otherwise you might look into one of three methods of making your searches faster - https://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/Aboutsummaryindexing#Comparing_su...

loganseth · ‎01-09-2023

yea, for sure was trying to provide detail, but rip.

ok, what does "one of those cases when subsearch makes sense" mean?

like, "one of those cases when subsearch makes sense...to use join or append" or something?

ty!

PickleRick · ‎01-09-2023

No. I mean that you use a subsearch to generate a set of conditions which will later apply to the main search.

Join is bad for different reasons but subsearches are usually also best avoided. But there are sometimes use cases when they can be reasonably used.

See https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches

loganseth · ‎01-09-2023

gotcha. yes, i'm working on 'avoiding' join for sure, and thought that stats values(*) thing was here to save me, but it's really intensive, still, too, for this particular data set.

for this particular search, i have some set of data in that index i want, which is like 200 rows.

so i just created a report for the 200 and sent to a lookup

then created a new report and used the other data search and | lookup and while it's substantially faster.

that is

| stats values(*) as * by id

took 1600 seconds

and

| index="mystuff"
| lookup filename.csv id OUTPUT col1 col2

took about 180s.

again, where I 'filtered down' the things i wanted to match by saving them into filename.csv first.

i mean, is that a hack? idk.

appreciate your wisdom always, sir.

bowesmana · ‎01-09-2023

The subsearch @PickleRick talks about would look something like this

index="myindex" resource="somefilter" [ 
  search index="myindex" resource="somefilter" column="terminated"
  | stats count by guid
  | fields guid ]
| stats values(*) as * by guid

so your subsearch is creating the list of guids that are terminated and then passing that list as guid constraints to the outer search by effectively adding (guid=A OR guid=B OR guid=C...) to the outer search.

Make your subsearch as concise as possible, i.e. use as many filter criteria as you can (sourcetype etc) to only select the call status records you want.

loganseth · ‎01-10-2023

HOLY SPLUNK BUCKETS, BATMAN!

This ran REALLY WELL.

I have never read/heard anyone explain it like that:

"effectively adding (guid=A OR guid=B OR guid=C...) to the outer search."

do you have any links where I can read more?

The explanation sounds like an inner SQL statement (which I'm most familiar with), like:

SELECT something
FROM somewhere
WHERE (
SELECT guid FROM somewhereelse WHERE status="terminated"
)

Seriously, this is fantastic. Days of google searches explained in a simple example.

Legends you guys.

bowesmana · ‎01-10-2023

There is some documentation on subsearches

https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches

and also, some documentation on how to format the output of a subsearch for some more 'interesting' requirements

https://docs.splunk.com/Documentation/Splunk/latest/Search/Changetheformatofsubsearchresults

and details on the all important 'format' command

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Format

When working out what the output from a subsearch looks like, you can run the subsearch on it's own and tag

| format

to the end and you cam what will be passed to the outer. So, you can run

index="myindex" resource="somefilter" column="terminated"
  | stats count by guid
  | fields guid
  | format

to see how your subsearch will work.

NOTE: subsearches are run BEFORE the outer search in almost all cases and they are not a solution to all problems. There are limitations to subsearch result sets, by default 10,000 results and 60 seconds runtime.

Imagine adding (guid=1 OR guid=2 OR ... guid=10000) to the outer search. That can sometimes slow down the outer search, so use them wisely.

PickleRick · ‎01-10-2023

Also remember that even while you might get your subsearch well within the 10k events limit, running a search with several thousand conditions might not be the best idea.

loganseth · ‎01-11-2023

Got it. So limit criteria to 999 conditions. 😉

Is there a 'faster' way to join to indexes ?

report acceleration

saved search

.conf25 Community Recap

Splunk App Developers | .conf25 Recap & What’s Next

Congratulations to the 2025-2026 SplunkTrust!

Are you a member of the Splunk Community?