This is not so much a question. But I don't see a solutions.splunk.com, so I will post this solution here, improve it with any feedback, and post the improved pattern as an answer.
Anti-pattern: map is a subsearch, so mapped searches are limited in the results that they can return.
pattern: loopable lookup table, create a lookup table that will provide a single successive value with each run.
1) create a table of values that you would like to loop on, with columns for loopable value and status. Status needs to be something that can be compared against. For example, all destination ports visited in a time period
| tstats summariesonly=t first(All_Traffic.dest_port) as port from datamodel=Network_Traffic.All_Traffic where All_Traffic.session_id!=0 All_Traffic.dest_zone=external by All_Traffic.dest_port | eval status="A" | fields + port status | outputlookup test_loop_ports.csv
2) Do some work using the minimum value from the loopable table and update the minimum value's status.
tstats summariesonly=t count from datamodel=Network_Traffic.All_Traffic where [|inputlookup test_loop_ports.csv where NOT status=done | stats min(port) as All_Traffic.dest_port] by All_Traffic.dest_port | outputlookup append=t test_port_count.csv | eval status="done", port='All_Traffic.dest_port' | append [|inputlookup test_loop_ports.csv] | stats max(status) as status by port | outputlookup test_loop_ports.csv
How will this look?
lets say that the first query returns ports 0,53,80.443, so testloopports.csv looks like
port status 0 A 53 A 80 A 443 A
On the first run of the second query, I get a new table and a modified testloopports.csv
port status 0 done 53 A 80 A 443 A
port count 0 10532
on the second run of the second query, port 0 has a status of "done", so the new minimum port is 53. After this run, our tables look like:
port status 0 done 53 done 80 A 443 A
port count 0 10532 53 96103
and so on until all statuses are done
port status 0 done 53 done 80 done 443 done
port count 0 10532 53 96103 80 909 443 1201
The two problems that I have with this process are that
I like your style. You may not realize this but
multisearch does not use
subsearches and so is not limited by their limits. So I would try something like this instead, so that it can run everything at the same time and not depend on a
lookup file. I tested it for 2 "loops" so it should run just fine for many more than that. In this contortion,
map is used only for it's templating function which allows us to set any number of
f# fields and values that are then passed as variables to each lower search. The only thing is that you really need to know how many passes in advance (or maybe you can take this approach and extend it generically).
| makeresults | eval looper=mvrange(1,3) | mvexpand looper | streamstats count AS foo | eval looper = "f" . looper | xyseries _time looper foo | map search="|multisearch [ search index=_* $f1$ | eval run=\"1:$f1$\" ] [ search index=_* $f2$ | eval run=\"2:$f2$\" ]"
Actually, now that I re-think this, I am sure that the
multisearch being inside of the
map will cause the entirety of the merged results to be limited by the limit on
map overall. So this will not work. It will have to be 2 parts. The first part sets the search details like this:
| makeresults | eval port=mvrange(1,3) | mvexpand port | fields - _time | outputlookup eraseme.csv
Then the second parts runs them all at the same time like this:
| multisearch [ search index=_* [| inputlookup eraseme.csv | head 1 | tail 1 | return $port] | eval run="1" ] [ search index=_* [| inputlookup eraseme.csv | head 2 | tail 1 | return $port] | eval run="2" ]
But then again, if this is what we are after, then why is a vanilla
subsearch like this insufficient?
index=_* [| inputlookup eraseme.csv |table port]
Whoa. multisearch invalidates my entire premise. The templating of map seemed nice until I understood that it was a subsearch.
I just got to the point where I understood your original response and then it changed. I had figured that I could give part one a fixed number of loopers by doing something like
|mysearch resulting in portCount, port
| append [|makeresults
| eval destport=mvrange(1,21)
| mvexpand destport
| streamstats count as portCount by dest_port
| eval looper = "f" . looper]
|sort 20 -count
Then I could multisearch on the 20 values.
But your second approach with just a straight up multisearch is easier to understand, I could set up my searches as
|mulitsearch | bigger search [|inputlookup test_loop_ports.csv | streamstats count as row|eventstats count as totalcount | head 1| tail 1 | where row<=totalcount | eval port=coalesce(port,"NA") | fields + port] | bigger search [inputlookup test_loop_ports.csv | streamstats count as row|eventstats count as totalcount | head 2| tail 1 | where row<=totalcount | eval port=coalesce(port,"NA") | fields + port]
(or even make a get_csvrow macro to be a little cleaner about it)
And then include enough multisearch clauses to exceed the number of ports that I could imagine showing up.
As to the final question, what is driving me is the need to do kmeans clustering by port. If I just feed kmeans all of the ports, then I do not get separate kmeans clustering for each port. This may apply to other outlier/clustering type searches as well
of course, you are correct! Thank you for the new command to use
and the head|tail pattern
Beware the blind use of the
head/tail. Unless you know FOR SURE exactly how many rows there are, you will need better mojo because , for example, using a 10-row lookup,
|head 10|tail 1, which is the last valid row, looks no different than
|head 20|tail 1, which would be a duplicated run.
That is where the macro comes in handy:
|inputlookup testloopports.csv | eventstats count as totalcount | head $rownumber$ | tail 1 | where $rownumber$<=totalcount | fields + port | append [|makeresults | eval port="NA"] | stats max(port) as port
Turns out that multisearch doesn't work for me. Multisearch only accepts streaming commands, I think that my eventstats commands do not comply, so I get:
Error in 'multisearch' command:
Multisearch subsearches may only contain purely streaming operations
(subsearch 1 contains a non-streaming command.)
Fortunately, I can use the head|tail thing and just schedule 30 searches accepting that 10 or so may have no results when I search for port="NA"
You should be able to put your non-streaming command at the very end, outside of your multisearches, since you are surely doing the sameish thing. If you need to do a differentish thing, then you can tag each multisearch dataset uniquely and differentatiate on the outside at the end.
Unfortunately, the initial part of my search uses tstats on the Network_Traffic datamodel.
Splunk says that this is non-streaming.