Splunk Search

Appending Data using Subsearch/Custom Commands & Dedup

I currently have a search that is looking at firewall data that looks something like this:

index=my_index sourcetype=fw_data action=drop src_ip!=""
| fields src_ip src_port dst_ip dst_port proto action
| my_special_command dst_ip 
| search my_command_output_field=true
| dedup src_ip dst_ip dst_port
| table _time src_ip src_port dst_ip dst_port proto action my_command_output_field

The problem I'm having is that when I pass my list of dst_ip (destination IP) addresses to my custom command (my_special_command), I'd like to pass the list dedup'd instead of as-is, so my custom command isn't running across the same values multiple times. However, I want that done in such a way that the original list of dst_ip addresses is not impacted/dedup'd once I continue my search. In other words, I want the list dedup'd, but only for the purpose of invoking my custom command.

For a little background, in this instance, my custom command takes a destination IP when it is invoked and returns an additional field (my_command_output_field) back to Splunk from an external API call that adds context to the data in my initial search. That piece works fine, however, when I pass that list to the custom command, I don't want the list to have any duplicates, as it really impacts performance. I've tried re-writing the invocation of the custom command part in my search a few different ways (I tried copying dst_ip to a new value via eval in one instance, and "join" in another) and haven't found a solution that works the way I want it to.

I know I could probably get a join statement to work if I re-ran the entire search up to that point in a sub-search, but to me that seems way less efficient since I'm then running the same search twice. What I really want to do is just take dst_ip, dedup it just for the purpose of passing it to the custom command, and then taking the additional field it returns and applying it to a non-dedup'd copy of dst_ip addresses without having to re-run the whole thing from the start.

Is this possible in Splunk, or is there maybe a better way/different way of looking at this that I haven't considered?

0 Karma


You could build some caching into your special command. At each invocation, create a map ip->output and only call the external API if you don't have an output value for that ip yet. Store the output in the map, avoid duplicate external calls.

Note, if your special command is set as "streaming", it may get invoked several times with chunks of the search results rather than once with the entire search results. Hence you may see several external API calls per ip rather than just one, but still way fewer than you originally would.