About kcnolan13

kcnolan13 · ‎10-21-2019

So you actually solved the problem? I'm trying the same thing using SCPv2 and it still waits for all 1M results before showing them in preview. Same configuration settings and CSC.

kcnolan13 · ‎01-24-2019

Sorry it took me a while to get back to you. I have played around with those options, but to no avail. And as the documentation states, they default to the most permissive settings anyway. As my response to timbits' question mentions, the problem may have something to do with the search_command.py flush() method, which invokes self._record_writer.flush(). Still no clue what the solution is though.

kcnolan13 · ‎01-24-2019

This is interesting -- that code actually triggers the error immediately, regardless of the size of the event set. Any thoughts on what this means the problem may be? Could it have something to do with the SearchCommand class's flush() method, which invokes self._record_writer.flush()?

kcnolan13 · ‎09-18-2018

I have seen some promotional material lauding how the new SCPv2 enables custom search commands to process millions of events with lower memory overhead now that they can operate in a true streaming/chunked fashion. However, I cannot seem to get any CSC's with the v2 protocol to handle more than a few hundred thousand events (even using the default implementation that simply yields the records passed). For example, consider the following example StreamingCommand: from splunklib.searchcommands import dispatch, StreamingCommand @Configuration() class simpleStreamingCmd(StreamingCommand): def stream(self, records): for record in records: yield record if __name__ == "__main__": dispatch(simpleStreamingCmd, sys.argv, sys.stdin, sys.stdout, __name__) Commands.conf configuration: [simplestreamingcmd] filename = simplestreamingcmd.py chunked = true Using a search that inputs a CSV of 1,000,000 events and feeds those events to the simple streaming command (which simply yields them right back), the following error is thrown (found in search.log): 09-18-2018 11:00:31.750 ERROR ChunkedExternProcessor - Failure writing result chunk, buffer full. External process possibly failed to read its stdin. 09-18-2018 11:00:31.750 ERROR ChunkedExternProcessor - Error in 'simplestreamingcmd' command: Failed to send message to external search command, see search.log. I started with a much more complex CSC to accomplish a specific task and eventually reduced it down to the simple example you see here, trying to figure out where the problem lies. I have tried writing StreamingCommands, EventingCommands, and ReportingCommands on multiple different search heads, and even tried multiple versions of Splunk (6.5.3 and 7.0.2) and updated to the latest Python SDK. Regardless of those, this seems to happen every time more than 300,000 events are passed to any chunked SCPv2 CSC. Any thoughts on what might be going on here? I would really like to use SCPv2, but unless I am doing something wrong here, this seems like a rather fundamental issue with it. I have seen a couple other users reporting what appears to be the same issue here: https://github.com/splunk/splunk-sdk-python/issues/150

kcnolan13 · ‎01-29-2018

Thanks for the definitive answer; that is what I suspected. I mainly just prefer the chart annotations, data labels, and select/deselect animations of TimeLineView over Timechart. But now that I have a definitive answer that the exposed configuration interface is as limited for TimeLineView as I thought, I may pivot to a Timechart instead.

kcnolan13 · ‎01-26-2018

I saw that post too, but unfortunately it didn't work for me... even after adding CSS !important rules. Regarding the timeline viz properties, correct me if I'm mistaken, but I believe TimelineViz and TimelineView are actually different things. That being said, I did try a few of those TimelineViz properties, but they had no effect either.

kcnolan13 · ‎01-26-2018

I am building an HTML dashboard that includes a TimelineView component ('splunkjs/mvc/timelineview'), instantiated like so: var timeline = new TimelineView({ id: 'main-timeline', managerid: 'reviewTableFetchSearch', el: $('#main-timeline') }, {tokens: true}).render(); I hoped a property like 'charting.seriesColors' or 'charting.fieldColors' might adjust the bar colors, but none of those work here and I haven't seen any documentation on changing the colors of this kind of component in the Splunk Web Framework site. Is it possible to do so?

kcnolan13 · ‎02-16-2017

I know there is some general documentation out there on config precedence, but I'd like to know the range of configuration settings you can specify in an app's "default" directory, and what effect this has on system configuration. For instance, if you create an authorize.conf, limits.conf, and transforms.conf within an app's "default" directory, and then specify all of these stanzas as "export = system" in default.meta, what actually happens to the existing system config when you install this app on a server? Here's why I ask: I would like to override a few properties in authorize.conf and limits.conf ONLY when one specific lookup occurs. I bundled the lookup file and transforms entry in a really bare-bones app, also containing the authorize.conf and limits.conf changes. The intent is to allow a few special configuration settings this lookup needs in a way that is minimally intrusive on the existing system's configuration. So, a few questions: If the properties in my app's authorize.conf have also been manually specified in /etc/system/local/authorize.conf, which file wins when my app's lookup appears in a search query? If my app's authorize.conf does take precedence, does it only take precedence when the lookup from that app is used in a query? (i.e., if that app's lookup is absent from a search query, which authorize.conf takes precedence now? Hopefully it is the /etc/system/local/ one) An extension of number 2. Same scenario, and if all of that holds, then what if there is no /etc/system/local/authorize.conf? Does Splunk know to fall back on /etc/system/default/authorize.conf? Or will my app's authorize.conf suddenly come back into play even though its lookup is not involved in the query?

kcnolan13 · ‎02-14-2017

Nice, I'm running the same thing on one of my large datasets right now, and the results are looking pretty good. Even the premium MaxMind databases were coming up with misses for about 1/3 of the IP's I check, and I'm seeing similar results from the IP2Location database, but with better performance. And it distributes just like it's supposed to! Shall we transform one of the comments in this chain into a formal answer to the original question? I think we've pretty much concluded that yes, custom search commands do have worse performance, due to the implementation differences between the custom command protocols and the dynamic lookup protocols. Primarily: External lookups can assume idempotence Custom Search Commands are limited to one streaming process External lookups are free to spin up many processes in parallel

kcnolan13 · ‎02-14-2017

Very cool! I had considered testing out ip2location a while ago, but already had a copy of the premium MaxMind databases and wanted to exhaust that option first. Did you use the Lite version of ip2location or the commercial version in your tests above? And which database, by the way? It looks like there is a free Lite IP-ASN database, but the closest thing I can find in the premium section is "IP-Country-ISP Database"... and it's unclear exactly what fields that would contain.

kcnolan13 · ‎02-13-2017

By the way, are we clear on whether the dynamic/external lookup protocol is creating new processes or new threads? From what you mentioned about Python overhead on Windows, I'm assuming it's creating separate full Python processes for each chunk of data? Is there any way we could safely write a custom search command using the V2 protocol that uses threads instead of processes? Or maybe a specific number of processes that all stay alive and get eventset chunks via IPC in a round-robin fashion? Depending on the answer to that question, now that we know all this, what do you think the bottom line is for a scenario where you need to perform hundreds of millions of ASN lookups in order to build a set of machine learning model features? Specifically, one of the features is distinct ASN count by domain, and there could be over 10 million domains, with over 200 million total events. Assume the environment is capable of distributed search over 5-10 indexers. Would you go with the process-spam lookup method or the slow-as-death custom search command protocol? Any concerns about eating up too much of the host resources if you know the indexers are pretty busy already? How about if the environment is all Windows? Even possible then with the extra Python process creation overhead? So I guess you could say the real question now is: is any algorithm like this going to be truly scalable if there is no way to do all of the following: Be compatible with distributed search Spin up a specific number of processes on each node and keep them all alive for geoip caching purposes Pass chunks of data through each process on each host via IPC in a similar manner to the single-threaded V2 protocol Would you even attempt to deploy an analytic that requires this many ASN lookups for a large enterprise if those conditions listed above can't be satisfied?

kcnolan13 · ‎02-13-2017

Nice analysis. I agree with your cookie distinct count of 2 for the above query, but my actual search query is still seeing a huge distinct count: index="scale_med" | rex "^dns,(?<dns_server>\S+),(?<log_file>\S+),(?<log_type>\S+),.*:\s+(?<timestamp>\S+\s+\S+)\s+client\s+(?<client_ip>\S+)#(?<client_port>\d+).*query:\s+(?<domain>\S+)\s+IN\s+(?<query_type>\S+)\s+response:\s+(?<response_code>\S+)\s+[+-]\S*,?\s?(?<raw_response>.*)" | where response_code="NOERROR" and query_type="A" | rex field=raw_response max_match=1000 "(?<record_name>\S+)\.\s+(?<ttl>\S+)\sIN\s(\w\s)?(?<record_type>(A))\s(?<record_ip>[^\s;]+)(;\s)?" | where isnotnull(record_ip) | eval _time = strptime(timestamp,"%d-%b-%Y %H:%M:%S.%f") | fields client_ip, domain, response_code, record_name, ttl, record_ip, eventtype | eval zip=mvzip(record_name,mvzip(ttl,record_ip)) | mvexpand zip | rex field=zip "(?<record_name>[^,]*),(?<ttl>[^,]*),(?<record_ip>.*)" | fields - zip | head 500000 | lookup ipasn_dat_lookup ip as record_ip OUTPUT asn as record_asn asn_error as record_asn_error cookie local=true | stats count dc(cookie) max(record_asn) min(record_asn) count dc(cookie) max(record_asn) min(record_asn) 500000 1987 393342 -2 I wonder what about these two search pipelines is causing such different behavior? I believe there may be some JSON involved in the V2 protocol's IPC, and the documentation indicates that it spins up one single process and streams all the events through. Clearly, this differs from the external lookup method, where we're seeing what looks like multiple processes. Would you agree that there should be one distinct cookie per process? If that is true, the external lookup for my query above must be creating tons and tons of processes for some reason(?) All in all, I knew that the MaxMind lookup would be the most taxing part of the custom scripts. I had assumed that the true streaming interface of the V2 protocol would help me take advantage of the MaxMind API's GeoIP caching flag, but would you conclude that the external lookup's manner of spamming the system with new processes every 255 events is what happens to get the job done faster, even without caching?

kcnolan13 · ‎02-13-2017

Thanks Martin! I appreciate you taking the time to look into this with me. It is good to see that you got similar performance results, but still disappointing that the custom search command is so slow. A couple questions about your comments: You say that there doesn't seem to be excessive re-launching of python every 255 events, but what led me to that conclusion is the code I inserted to add a "cookie" field, which is a random number computed at the top of each python script. That code might actually be commented out in the ipasn_dat_lookup.py script right now, but you should be able to comment it back in (in that particular script, it might be called the "debug" field, rather than "cookie". sorry for the inconsistency there). I ran the lookup scripts on 1,000,000 events, and stats showed that about (1,000,000/255) unique values for the "cookie" (or "debug") fields existed for the lookup versions of the commands. For the custom search commands, there was only one unique value. So, I concluded that the external lookup scripts were getting re-invoked every 255 events, while the custom search commands were only dispatched once and streamed events in chunks. Would you agree with this line of reasoning? If the old external lookup method really is faster, I would like to find a definitive answer why. The best information I could find on the inner workings of traditional Splunk streaming commands was this post from 2012, but it doesn’t explicitly say how much parallelization there is, nor does it say if that applies to external lookup commands. Overall, nothing has explained why my results differ so radically from the performance benchmarks in this powerpoint on the V2 protocol. Now that you have reviewed my code, note that the MaxMind lookups are being performed the exact same way across both the custom command and external lookup versions. So, with your comment about something "inside the custom command [being] terribly inefficient?" -- do you mean that the custom command protocol itself is tremendously inefficient? If so, is that something we should try to report to the Splunk Dev's? Either way, if external lookups are always going to be this much faster than custom commands, it seems like that should be stated somewhere. This question as a whole is actually closely related with another I have out there. Since you already have the code, I was hoping you might be able to take a look at this: https://answers.splunk.com/answers/496840/custom-streaming-command-wont-distribute-to-indexe.html Try as I may, that custom command refuses to distribute to indexers. I feel like an answer to that question may be more achievable, and I would very much like to find out what the problem is. I have implemented versions with the V2 protocol and the old Intersplunk way of doing things, and the streaming command isn't showing up in remoteSearch either way.

kcnolan13 · ‎02-13-2017

Sorry for the trouble, Martin. It looks like I had been playing with the dat command's decorators during testing and forgot to change them back. Here's a new version, with the underscores ripped out: ipasn (side note: do you happen to know why the underscores were causing so much grief?) I tested your 8.8.8.8 lookup command right before packaging using "eval ... | ipasndat" and it worked fine on this version. If you encounter any other errors, one thing to try from the command line is: /opt/splunk/bin/splunk cmd python ipasn_dat.py < ipasn.csv With the V2 protocol, I don't think that will yield any actual results like the lookup version of the commands would, but it might at least bring any import errors or things like that to light. One last note, when I first installed these commands a couple weeks ago, I initially got that "error code 1" message several times, and then all of a sudden the commands mysteriously started working. I have seen similar behavior when installing the official GeoASN app as well. Let me know how it goes this time! I'm assuming there must be a better way to propagate errors up through to the job inspector than what I have now? I'll look into that, but hopefully the new package will work better already.

kcnolan13 · ‎02-13-2017

Hey Martin, Here's a packaged version of both apps: ip_asn_lookup.spl ip_asn.spl Note that there are two commands in each app. The lookup version has ip_asn_lookup and ip_asn_dat_lookup, while the custom search command has ip_asn and ip_asn_dat. The "dat" command versions query the legacy MaxMind binary database, while the others query the newer ".mmdb" format. My results have shown that the legacy API is actually faster, so whichever commands you use for testing, make sure to use the ones with equivalent MaxMind API's. Also on that topic, the "lookups" folders originally contained the paid versions of the MaxMind databases: GeoIP2-ISP.mmdb, and GeoIPASNum.dat. Since I am not free to distribute those files, I have removed them from both apps, and replaced them with just the Lite version of GeoIPASNum.dat (I couldn't find a lite version of the ISP database at the moment). So, without adding your own copy of GeoIP2-ISP.mmdb to the lookups folders, only the "_dat" commands will work. The custom search commands assert that a field named "record_ip" exists in each event, upon which the ASN lookup is performed. Meanwhile, the external lookups operate on just an "ip" field, as defined in transforms.conf. Let me know if you have any questions, and I look forward to hearing back.

kcnolan13 · ‎02-09-2017

Configuring it as a GeneratingCommand in the streams pipeline was an interesting idea, but it looks like it won't work for my use case. The command needs to perform ASN lookups on an existing set of events, but GeneratingCommands "must be the first command of a search." I did rewrite it as a GeneratingCommand just to see what would happen, but Splunk pitched that error in response. Do you have any ideas about how we might get this issue more visibility? I think it's worthy of bringing up to some Splunk dev's -- if there are widespread distributing issues with streaming commands using splunklib, it's definitely something that should be addressed.

kcnolan13 · ‎02-08-2017

I have tried passing several combinations of arguments to the Configuration decorator, including those that you mention, but so far no luck. For Streaming commands, both 'type' and 'streaming' are actually fixed parameters (as defined in splunklib/searchcommands/streaming_command.py). So attempting to specify them in the Configuration decorator throws an error. I've also tried leaving @Configuration() empty, since splunklib's streaming_command.py indicates that 'streaming' should be fixed to True, and 'distributed' defaults to True in the V2 protocol. 'local' also defaults to False. So in theory, no extra configuration should be needed to get this to distribute to indexers. Any other thoughts?

kcnolan13 · ‎02-06-2017

I wrote two versions of the same Python streaming command: one as a simple external lookup script, and one as a full custom search command (using V2 of the custom search command protocol). I tested the performance of both commands, and found that the external lookup script was much faster.... which is highly counter-intuitive. Why might this be? Is there a reason custom search commands could actually be slower than equivalent external lookup scripts? Here is a break-down of the two command versions. 1. External Lookup Script Loads geoip database into memory with MEMORY_CACHE flag Uses the csv module to read events from std_in Performs a geoip lookup on each event's ip field, stores result in new field Writes each line (event) back to stdout using csv module 2. Custom Search Command with V2 Protocol Loads geoip database and defines custom streaming command like so: #create GeoIP instance with Memory Cache geoip_db = pygeoip.GeoIP(ISP_DB_PATH, pygeoip.const.MEMORY_CACHE) @Configuration() class ipasnCommand(StreamingCommand): def stream(self, events): #transform each event in the chunk for event in events: ...... [lookup logic goes here] yield event dispatch(ipasnCommand, sys.argv, sys.stdin, sys.stdout, __name__) Note that both command versions are written in Python, use the same geoip lookup library with the same caching flag, and make the same lookup function calls. Also note that while the custom streaming command is only dispatched/invoked once and events are passed in chunks, Splunk seems to re-invoke the external lookup script every 255 events . . . which means the geoip database gets reloaded and caching is wiped out, leading one to hypothesize that the external lookup version should perform much worse. However, multiple trials confirm that when given 1 million events to process, the custom search command takes an average of 00:09:10, while the external lookup can do it in 00:07:06. I was disappointed to observe such a large performance deficit from the custom search command, despite all the supposed advantages. Does anyone have some insight into what could be causing this? Is this performance gap to be expected?

kcnolan13 · ‎02-03-2017

Okay thanks for continuing to look! I'm quite puzzled by why it doesn't work yet. But I'm definitely not seeing it in remoteSearch, and the performance is exactly as bad as it would be running on just the search head.

kcnolan13 · ‎02-03-2017

Thanks for the response! I can't share the exact command syntax, but trust that it is placed before all non-streaming commands. In fact, this custom command is being used in place of an older distributed steaming command (that does distribute correctly within the same larger search query, but is less efficient). The command is indeed defined on all indexers and deployed in the same manner to the search head and all indexers. I also verified that it can be run manually on each of the indexers.

kcnolan13 · ‎02-03-2017

I am writing a custom streaming search command using the Python SDK and the V2 Protocol. I followed the V2 protocol documentation closely, and have created a streaming command that should play nicely with distributed search... but for some reason, it is not being distributed to indexers. The "remoteSearch" field in the job inspector confirms this. Search head and indexers are running Splunk 6.5. The command is declared as a streaming command in its .py file according to this convention: @Configuration(distributed=True, local=False) class testCommand(StreamingCommand): def stream(self, events): #transform each event in the chunk for event in events: ... yield event dispatch(ipasnCommand, sys.argv, sys.stdin, sys.stdout, __name__) And the commands.conf looks like this: [test_command] filename = test_command.py chunked = true distributed = true local = false I realize that streaming commands should distribute to indexers automatically, but as soon as I noticed that wasn't happening, I started specifying these configuration parameters explicitly just to see if that made a difference. So far, it has not. Looking at the "remoteSearch" field in the job inspector shows that this command is not ever being offloaded to the indexer peers... and the poor performance I am seeing speaks to that as well. Any thoughts on why that might be? Any configuration step I might have missed to make it distribute effectively? Note that the command is deployed to the search head and indexers in the same manner, and that it can be run manually on each of the indexers as well as search head. In my search string, the command is simply being written to replace an existing distributed search command that does work, so its position within my particular search string should not be a problem.

kcnolan13 · ‎02-02-2017

Solution was to simply add this to logging.conf: [handlers] keys = app, splunklib, stderr

kcnolan13 · ‎02-02-2017

Never mind, figured it out. Just need to iterate over events and yield each one.

kcnolan13 · ‎02-02-2017

Thanks, just figured that part out on my own I think. Does anyone have an example of a streaming command that actually does something? I'm not sure what the syntax is for modifying events in the "def stream(self, events):" function. Having pass in there doesn't seem to work.... the web interface is throwing: TypeError at "/opt/splunk/etc/apps/ip_asn/bin/splunklib/searchcommands/internals.py", line 519 : 'NoneType' object is not iterable

kcnolan13 · ‎02-02-2017

Okay, so just by playing around with it, I seem to have worked around the errors. Using this did the trick: [handlers] keys = app, splunklib, stderr

Posts	52
Solutions	1
Karma Given	9
Karma Received	27
Member Since	‎09-22-2016

Online Status	Offline
Date Last Visited	‎06-05-2020 02:04 AM

How come custom search commands (CSC) SCPv2 cannot...

Possible to change color of a TimelineView?

When overriding configs in apps or add-ons with mi...

Do custom search commands have worse performance t...

Custom Streaming Command Won't Distribute to Index...

How to resolve Splunk SDK for Python custom stream...

Is the installation of Splunk Python SDK essential...

What is the best upgrade path for Machine Learning...

How do external commands work? Can data be cached?

Why is REST API removing a leading pipe before an ...

Re: Preview generating custom command result

Re: How come custom search commands (CSC) SCPv2 ca...

Re: How come custom search commands (CSC) SCPv2 ca...

How come custom search commands (CSC) SCPv2 cannot...

Re: Possible to change color of a TimelineView?

Re: Possible to change color of a TimelineView?

Possible to change color of a TimelineView?

When overriding configs in apps or add-ons with mi...

Re: Do custom search commands have worse performan...

Re: Do custom search commands have worse performan...

Re: Do custom search commands have worse performan...

Re: Do custom search commands have worse performan...

Re: Do custom search commands have worse performan...

Re: Do custom search commands have worse performan...

Re: Do custom search commands have worse performan...

Re: Custom Streaming Command Won't Distribute to I...

Re: Custom Streaming Command Won't Distribute to I...

Do custom search commands have worse performance t...

Re: Custom Streaming Command Won't Distribute to I...

Re: Custom Streaming Command Won't Distribute to I...

Custom Streaming Command Won't Distribute to Index...

Re: How to resolve Splunk SDK for Python custom st...

Re: How to resolve Splunk SDK for Python custom st...

Re: How to resolve Splunk SDK for Python custom st...

Re: How to resolve Splunk SDK for Python custom st...

Are you a member of the Splunk Community?