Developing for Splunk Enterprise

Do custom search commands have worse performance than external lookup scripts?

Communicator

I wrote two versions of the same Python streaming command: one as a simple external lookup script, and one as a full custom search command (using V2 of the custom search command protocol). I tested the performance of both commands, and found that the external lookup script was much faster.... which is highly counter-intuitive.

Why might this be? Is there a reason custom search commands could actually be slower than equivalent external lookup scripts?

Here is a break-down of the two command versions.

1. External Lookup Script
Loads geoip database into memory with MEMORYCACHE flag
Uses the csv module to read events from std
in
Performs a geoip lookup on each event's ip field, stores result in new field
Writes each line (event) back to stdout using csv module

2. Custom Search Command with V2 Protocol
Loads geoip database and defines custom streaming command like so:

#create GeoIP instance with Memory Cache
geoip_db = pygeoip.GeoIP(ISP_DB_PATH, pygeoip.const.MEMORY_CACHE)

@Configuration()
class ipasnCommand(StreamingCommand):

    def stream(self, events):
        #transform each event in the chunk
        for event in events:
                     ...... [lookup logic goes here]
                     yield event

dispatch(ipasnCommand, sys.argv, sys.stdin, sys.stdout, __name__)

Note that both command versions are written in Python, use the same geoip lookup library with the same caching flag, and make the same lookup function calls.

Also note that while the custom streaming command is only dispatched/invoked once and events are passed in chunks, Splunk seems to re-invoke the external lookup script every 255 events . . . which means the geoip database gets reloaded and caching is wiped out, leading one to hypothesize that the external lookup version should perform much worse.

However, multiple trials confirm that when given 1 million events to process, the custom search command takes an average of 00:09:10, while the external lookup can do it in 00:07:06.

I was disappointed to observe such a large performance deficit from the custom search command, despite all the supposed advantages. Does anyone have some insight into what could be causing this? Is this performance gap to be expected?

1 Solution

SplunkTrust
SplunkTrust

Have you considered just using a .csv CIDR-based lookup? With a tiny bit of limits.conf memory tweaking, that .csv will fit in memory and will distribute to indexers.
A quick test on my laptop:

| makeresults count=1000000 | eval record_ip = random() | eval record_ip = record_ip%256 . ".". floor(record_ip/256)%256 . ".". floor(record_ip/256/256)%256 . ".". floor(record_ip/256/256/256)%256
| lookup ip2asn cidr as record_ip | stats count max(asn) min(asn)

22.16    command.lookup

22 seconds for 1M lookups, single splunk instance, dual-core windows laptop.
I used this database for testing: http://lite.ip2location.com/database/ip-asn - the cidr, asn, as fields together are 28MB. If you can do without the as that'd shrink considerably again.

transforms.conf
[ip2asn]
filename = IP2LOCATION-LITE-ASN.csv
match_type = cidr(cidr)

Edit: On a fast ubuntu box, 1M records need 13s for the lookup, 5M records need 50s.

View solution in original post