splunk receives 2 different stream data sets on a single hec (json).
so if i want call detail information from set 1 on calls that meet criteria in set 2, i have to join the records.
i used to use 'join' but read several articles about other ways and came across this method which I like, but really feels so slow/heavy
index="myindex" resource="somefilter"
| stats values(*) as * by guid
| search column="terminated"
because we have millions of rows to search from and i'm just looking for a few.
I tried adding my search criteria higher up, like this:
index="myindex" resource="somefilter" column="terminated"
| stats values(*) as * by guid
but then the other columns come back empty (I presume because it filtered them out, so nothing to join).
So looking for another/faster/better way to:
1. get data from set 2 with criteria X
2. bring back matches of that data from set 1.
Always many thanks for the education!
The subsearch @PickleRick talks about would look something like this
index="myindex" resource="somefilter" [
search index="myindex" resource="somefilter" column="terminated"
| stats count by guid
| fields guid ]
| stats values(*) as * by guid
so your subsearch is creating the list of guids that are terminated and then passing that list as guid constraints to the outer search by effectively adding (guid=A OR guid=B OR guid=C...) to the outer search.
Make your subsearch as concise as possible, i.e. use as many filter criteria as you can (sourcetype etc) to only select the call status records you want.
Hard to say with such imprecise description 😉
But if you're looking for just a few guids, it might be one of those cases when subsearch makes sense.
Otherwise you might look into one of three methods of making your searches faster - https://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/Aboutsummaryindexing#Comparing_su...
yea, for sure was trying to provide detail, but rip.
ok, what does "one of those cases when subsearch makes sense" mean?
like, "one of those cases when subsearch makes sense...to use join or append" or something?
ty!
No. I mean that you use a subsearch to generate a set of conditions which will later apply to the main search.
Join is bad for different reasons but subsearches are usually also best avoided. But there are sometimes use cases when they can be reasonably used.
See https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches
gotcha. yes, i'm working on 'avoiding' join for sure, and thought that stats values(*) thing was here to save me, but it's really intensive, still, too, for this particular data set.
for this particular search, i have some set of data in that index i want, which is like 200 rows.
so i just created a report for the 200 and sent to a lookup
then created a new report and used the other data search and | lookup and while it's substantially faster.
that is
| stats values(*) as * by id
took 1600 seconds
and
| index="mystuff"
| lookup filename.csv id OUTPUT col1 col2
took about 180s.
again, where I 'filtered down' the things i wanted to match by saving them into filename.csv first.
i mean, is that a hack? idk.
appreciate your wisdom always, sir.
The subsearch @PickleRick talks about would look something like this
index="myindex" resource="somefilter" [
search index="myindex" resource="somefilter" column="terminated"
| stats count by guid
| fields guid ]
| stats values(*) as * by guid
so your subsearch is creating the list of guids that are terminated and then passing that list as guid constraints to the outer search by effectively adding (guid=A OR guid=B OR guid=C...) to the outer search.
Make your subsearch as concise as possible, i.e. use as many filter criteria as you can (sourcetype etc) to only select the call status records you want.
HOLY SPLUNK BUCKETS, BATMAN!
This ran REALLY WELL.
I have never read/heard anyone explain it like that:
"effectively adding (guid=A OR guid=B OR guid=C...) to the outer search."
do you have any links where I can read more?
The explanation sounds like an inner SQL statement (which I'm most familiar with), like:
SELECT something
FROM somewhere
WHERE (
SELECT guid FROM somewhereelse WHERE status="terminated"
)
Seriously, this is fantastic. Days of google searches explained in a simple example.
Legends you guys.
There is some documentation on subsearches
https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches
and also, some documentation on how to format the output of a subsearch for some more 'interesting' requirements
https://docs.splunk.com/Documentation/Splunk/latest/Search/Changetheformatofsubsearchresults
and details on the all important 'format' command
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Format
When working out what the output from a subsearch looks like, you can run the subsearch on it's own and tag
| format
to the end and you cam what will be passed to the outer. So, you can run
index="myindex" resource="somefilter" column="terminated"
| stats count by guid
| fields guid
| format
to see how your subsearch will work.
NOTE: subsearches are run BEFORE the outer search in almost all cases and they are not a solution to all problems. There are limitations to subsearch result sets, by default 10,000 results and 60 seconds runtime.
Imagine adding (guid=1 OR guid=2 OR ... guid=10000) to the outer search. That can sometimes slow down the outer search, so use them wisely.
Also remember that even while you might get your subsearch well within the 10k events limit, running a search with several thousand conditions might not be the best idea.
Got it. So limit criteria to 999 conditions. 😉