Reporting

Is there a 'faster' way to join to indexes ?

loganseth
Path Finder

splunk receives 2 different stream data sets on a single hec (json).

  • set 1 has call records
  • set 2 has call status/disposition

so if i want call detail information from set 1 on calls that meet criteria in set 2, i have to join the records.

i used to use 'join' but read several articles about other ways and came across this method which I like, but really feels so slow/heavy

 

 

index="myindex" resource="somefilter"
| stats values(*) as * by guid
| search column="terminated"

 

 

because we have millions of rows to search from and i'm just looking for a few.

I tried adding my search criteria higher up, like this:

 

 

index="myindex" resource="somefilter" column="terminated"
| stats values(*) as * by guid

 

 


but then the other columns come back empty (I presume because it filtered them out, so nothing to join).

So looking for another/faster/better way to:
1. get data from set 2 with criteria X
2. bring back matches of that data from set 1.

Always many thanks for the education!

Labels (2)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

The subsearch @PickleRick talks about would look something like this

index="myindex" resource="somefilter" [ 
  search index="myindex" resource="somefilter" column="terminated"
  | stats count by guid
  | fields guid ]
| stats values(*) as * by guid

so your subsearch is creating the list of guids that are terminated and then passing that list as guid constraints to the outer search by effectively adding (guid=A OR guid=B OR guid=C...) to the outer search.

Make your subsearch as concise as possible, i.e. use as many filter criteria as you can (sourcetype etc) to only select the call status records you want.

 

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Hard to say with such imprecise description 😉

But if you're looking for just a few guids, it might be one of those cases when subsearch makes sense.

Otherwise you might look into one of three methods of making your searches faster - https://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/Aboutsummaryindexing#Comparing_su...

0 Karma

loganseth
Path Finder

yea, for sure was trying to provide detail, but rip.

ok, what does "one of those cases when subsearch makes sense" mean?

like, "one of those cases when subsearch makes sense...to use join or append" or something?

ty!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No. I mean that you use a subsearch to generate a set of conditions which will later apply to the main search.

Join is bad for different reasons but subsearches are usually also best avoided. But there are sometimes use cases when they can be reasonably used.

See https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches

 

loganseth
Path Finder

gotcha.  yes, i'm working on 'avoiding' join for sure, and thought that stats values(*) thing was here to save me, but it's really intensive, still, too, for this particular data set.

for this particular search, i have some set of data in that index i want, which is like 200 rows.

so i just created a report for the 200 and sent to a lookup

then created  a new report and used the other data search and | lookup and while it's substantially faster.

that is

| stats values(*) as * by id

took 1600 seconds

and

| index="mystuff"
| lookup filename.csv id OUTPUT col1 col2

took about 180s.

again, where I 'filtered down' the things i wanted to match by saving them into filename.csv first.

i mean, is that a hack? idk. 

appreciate your wisdom always, sir.

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

The subsearch @PickleRick talks about would look something like this

index="myindex" resource="somefilter" [ 
  search index="myindex" resource="somefilter" column="terminated"
  | stats count by guid
  | fields guid ]
| stats values(*) as * by guid

so your subsearch is creating the list of guids that are terminated and then passing that list as guid constraints to the outer search by effectively adding (guid=A OR guid=B OR guid=C...) to the outer search.

Make your subsearch as concise as possible, i.e. use as many filter criteria as you can (sourcetype etc) to only select the call status records you want.

 

loganseth
Path Finder

HOLY SPLUNK BUCKETS, BATMAN!

This ran REALLY WELL.


I have never read/heard anyone explain it like that:

"effectively adding (guid=A OR guid=B OR guid=C...) to the outer search."

do you have any links where I can read more?

The explanation sounds like an inner SQL statement (which I'm most familiar with), like:

SELECT something
FROM somewhere
WHERE (
SELECT guid FROM somewhereelse WHERE status="terminated"
)


Seriously, this is fantastic.  Days of google searches explained in a simple example.

Legends you guys.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

There is some documentation on subsearches 

https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches

and also, some documentation on how to format the output of a subsearch for some more 'interesting' requirements

https://docs.splunk.com/Documentation/Splunk/latest/Search/Changetheformatofsubsearchresults

and details on the all important 'format' command

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Format

When working out what the output from a subsearch looks like, you can run the subsearch on it's own and tag 

| format

to the end and you cam what will be passed to the outer. So, you can run

index="myindex" resource="somefilter" column="terminated"
  | stats count by guid
  | fields guid
  | format

to see how your subsearch will work.

NOTE: subsearches are run BEFORE the outer search in almost all cases and they are not a solution to all problems. There are limitations to subsearch result sets, by default 10,000 results and 60 seconds runtime.

Imagine adding (guid=1 OR guid=2 OR ... guid=10000) to the outer search. That can sometimes slow down the outer search, so use them wisely. 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Also remember that even while you might get your subsearch well within the 10k events limit, running a search with several thousand conditions might not be the best idea.

loganseth
Path Finder

Got it.  So limit criteria to 999 conditions. 😉

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...