Archive

Does inputcsv honor dispatch=true?

Super Champion

I'm am unable to get inputcsv to read from the dispatch (search job-specific directory). Does anyone know if this is a bug or if there are any workarounds?

I need to stash my search results temporarily during the execution of a search. The docs that if you use the dispatch=true parameter to outputcsv and inputcsv commands they will use the job-specific dispatch folder instead of from a shared system folder. This sounds perfect, because I don't want to deal with concurrency, user access, or cleanup issues if Splunk will do that for me. (Frankly, this seems like the only remaining legitimate use for the outputcsv/inputcsv, everything should use outputlookup/inputlookup instead.)

However, while it appears that outputcsv honors dispatch=true, the inputcsv command ignores it. Here's the messages created when I run my search:

 Results written to file '/opt/splunk/var/run/splunk/dispatch/1522855828.301/cluster_output.csv' on serverName='...'
 File '/opt/splunk/var/run/splunk/csv/cluster_output.csv' could not be opened for reading.

Example use case:

This shouldn't really need an example, but to avoid the inevitable push back regarding why I would ever want to do this, I'll elaborate on the use case. (Yes, typically appendpipe removes the need for this type of stash/unstash operations, but in this case that's not an option.) I'm attempting to provide output similar to the "Patterns" tab on the Search view but with some additional information about event sizes. The findkeywords SPL command is labeled as "internal" and places some unusual expectations on the search pipeline, but it does very cool things that I really want, so I'm willing to put up with it's idiosyncrasies.

index=data sourcetype=mydata
| head 50000 
| eval bytes=len(_raw)
| cluster t=0.75 field=_raw showcount=true labelonly=true labelfield=groupID
| fields _time _raw bytes groupID cluster_count timestartpos timeendpos
| outputcsv dispatch=true cluster_output.csv
| findkeywords labelfield=groupID dedup=true
| inputcsv dispatch=true append=true cluster_output.csv
| stats sum(bytes) as total_bytes, values(*Keywords) as *Keywords, last(sampleEvent) by groupID

The above works if I abandon the "dispatch=true" and use this instead:

...
| outputcsv create_empty=true cluster_output.csv
| findkeywords labelfield=groupID dedup=true
| inputcsv append=true cluster_output.csv
....

However, that leaves my file open to be read by any Splunk users, there's no cleanup, and most importantly, it causes chaos if my search is run concurrently. (This search will eventually be run on a dashboard)

0 Karma