This is not so much a question. But I don't see a solutions.splunk.com, so I will post this solution here, improve it with any feedback, and post the improved pattern as an answer.
Anti-pattern: map is a subsearch, so mapped searches are limited in the results that they can return.
example: https://answers.splunk.com/answers/589791/can-i-do-kmeans-by-a-for-a-column-by-a-column.html
pattern: loopable lookup table, create a lookup table that will provide a single successive value with each run.
how-to
1) create a table of values that you would like to loop on, with columns for loopable value and status. Status needs to be something that can be compared against. For example, all destination ports visited in a time period
| tstats summariesonly=t first(All_Traffic.dest_port) as port
from datamodel=Network_Traffic.All_Traffic
where All_Traffic.session_id!=0 All_Traffic.dest_zone=external
by All_Traffic.dest_port
| eval status="A" | fields + port status
| outputlookup test_loop_ports.csv
2) Do some work using the minimum value from the loopable table and update the minimum value's status.
tstats summariesonly=t count
from datamodel=Network_Traffic.All_Traffic
where [|inputlookup test_loop_ports.csv where NOT status=done | stats min(port) as All_Traffic.dest_port]
by All_Traffic.dest_port
| outputlookup append=t test_port_count.csv | eval status="done", port='All_Traffic.dest_port'
| append [|inputlookup test_loop_ports.csv]
| stats max(status) as status by port
| outputlookup test_loop_ports.csv
How will this look?
lets say that the first query returns ports 0,53,80.443, so test_loop_ports.csv looks like
port status
0 A
53 A
80 A
443 A
On the first run of the second query, I get a new table and a modified test_loop_ports.csv
test_loop_ports:
port status
0 done
53 A
80 A
443 A
test_port_count.csv
port count
0 10532
on the second run of the second query, port 0 has a status of "done", so the new minimum port is 53. After this run, our tables look like:
test_loop_ports:
port status
0 done
53 done
80 A
443 A
test_port_count.csv
port count
0 10532
53 96103
and so on until all statuses are done
test_loop_ports:
port status
0 done
53 done
80 done
443 done
test_port_count.csv
port count
0 10532
53 96103
80 909
443 1201
The two problems that I have with this process are that
I like your style. You may not realize this but multisearch
does not use subsearches
and so is not limited by their limits. So I would try something like this instead, so that it can run everything at the same time and not depend on a lookup
file. I tested it for 2 "loops" so it should run just fine for many more than that. In this contortion, map
is used only for it's templating function which allows us to set any number of f#
fields and values that are then passed as variables to each lower search. The only thing is that you really need to know how many passes in advance (or maybe you can take this approach and extend it generically).
| makeresults
| eval looper=mvrange(1,3)
| mvexpand looper
| streamstats count AS foo
| eval looper = "f" . looper
| xyseries _time looper foo
| map search="|multisearch
[ search index=_* $f1$ | eval run=\"1:$f1$\" ]
[ search index=_* $f2$ | eval run=\"2:$f2$\" ]"
Actually, now that I re-think this, I am sure that the multisearch
being inside of the map
will cause the entirety of the merged results to be limited by the limit on map
overall. So this will not work. It will have to be 2 parts. The first part sets the search details like this:
| makeresults
| eval port=mvrange(1,3)
| mvexpand port
| fields - _time
| outputlookup eraseme.csv
Then the second parts runs them all at the same time like this:
| multisearch
[ search index=_* [| inputlookup eraseme.csv | head 1 | tail 1 | return $port] | eval run="1" ]
[ search index=_* [| inputlookup eraseme.csv | head 2 | tail 1 | return $port] | eval run="2" ]
But then again, if this is what we are after, then why is a vanilla subsearch
like this insufficient?
index=_* [| inputlookup eraseme.csv |table port]
I like your style. You may not realize this but multisearch
does not use subsearches
and so is not limited by their limits. So I would try something like this instead, so that it can run everything at the same time and not depend on a lookup
file. I tested it for 2 "loops" so it should run just fine for many more than that. In this contortion, map
is used only for it's templating function which allows us to set any number of f#
fields and values that are then passed as variables to each lower search. The only thing is that you really need to know how many passes in advance (or maybe you can take this approach and extend it generically).
| makeresults
| eval looper=mvrange(1,3)
| mvexpand looper
| streamstats count AS foo
| eval looper = "f" . looper
| xyseries _time looper foo
| map search="|multisearch
[ search index=_* $f1$ | eval run=\"1:$f1$\" ]
[ search index=_* $f2$ | eval run=\"2:$f2$\" ]"
Actually, now that I re-think this, I am sure that the multisearch
being inside of the map
will cause the entirety of the merged results to be limited by the limit on map
overall. So this will not work. It will have to be 2 parts. The first part sets the search details like this:
| makeresults
| eval port=mvrange(1,3)
| mvexpand port
| fields - _time
| outputlookup eraseme.csv
Then the second parts runs them all at the same time like this:
| multisearch
[ search index=_* [| inputlookup eraseme.csv | head 1 | tail 1 | return $port] | eval run="1" ]
[ search index=_* [| inputlookup eraseme.csv | head 2 | tail 1 | return $port] | eval run="2" ]
But then again, if this is what we are after, then why is a vanilla subsearch
like this insufficient?
index=_* [| inputlookup eraseme.csv |table port]
Whoa. multisearch invalidates my entire premise. The templating of map seemed nice until I understood that it was a subsearch.
I just got to the point where I understood your original response and then it changed. I had figured that I could give part one a fixed number of loopers by doing something like
|mysearch resulting in portCount, port
| append [|makeresults
| eval dest_port=mvrange(1,21)
| mvexpand dest_port
| streamstats count as portCount by dest_port
| eval looper = "f" . looper]
|sort 20 -count
Then I could multisearch on the 20 values.
But your second approach with just a straight up multisearch is easier to understand, I could set up my searches as
|mulitsearch
| bigger search [|inputlookup test_loop_ports.csv | streamstats count as row|eventstats count as totalcount | head 1| tail 1 | where row<=totalcount | eval port=coalesce(port,"NA") | fields + port]
| bigger search [inputlookup test_loop_ports.csv | streamstats count as row|eventstats count as totalcount | head 2| tail 1 | where row<=totalcount | eval port=coalesce(port,"NA") | fields + port]
(or even make a get_csvrow macro to be a little cleaner about it)
And then include enough multisearch clauses to exceed the number of ports that I could imagine showing up.
As to the final question, what is driving me is the need to do kmeans clustering by port. If I just feed kmeans all of the ports, then I do not get separate kmeans clustering for each port. This may apply to other outlier/clustering type searches as well
So you have what you really need, right?
of course, you are correct! Thank you for the new command to use
and the head|tail pattern
Turns out that multisearch doesn't work for me. Multisearch only accepts streaming commands, I think that my eventstats commands do not comply, so I get:
Error in 'multisearch' command:
Multisearch subsearches may only contain purely streaming operations
(subsearch 1 contains a non-streaming command.)
Fortunately, I can use the head|tail thing and just schedule 30 searches accepting that 10 or so may have no results when I search for port="NA"
You should be able to put your non-streaming command at the very end, outside of your multisearches, since you are surely doing the sameish thing. If you need to do a differentish thing, then you can tag each multisearch dataset uniquely and differentatiate on the outside at the end.
Unfortunately, the initial part of my search uses tstats on the Network_Traffic datamodel.
Splunk says that this is non-streaming.
The tstats
command supports append=t
for prestats
mode and that function is unlimited, too, so just use |tstats append=t prestats
and go crazy.
Beware the blind use of the head/tail
. Unless you know FOR SURE exactly how many rows there are, you will need better mojo because , for example, using a 10-row lookup, |head 10|tail 1
, which is the last valid row, looks no different than |head 20|tail 1
, which would be a duplicated run.
That is where the macro comes in handy:
|inputlookup test_loop_ports.csv | eventstats count as totalcount | head $rownumber$ | tail 1 | where $rownumber$<=totalcount | fields + port | append [|makeresults | eval port="NA"] | stats max(port) as port