Getting Data In

Does inputcsv honor dispatch=true?

Lowell
Super Champion

I'm am unable to get inputcsv to read from the dispatch (search job-specific directory). Does anyone know if this is a bug or if there are any workarounds?

I need to stash my search results temporarily during the execution of a search. The docs that if you use the dispatch=true parameter to outputcsv and inputcsv commands they will use the job-specific dispatch folder instead of from a shared system folder. This sounds perfect, because I don't want to deal with concurrency, user access, or cleanup issues if Splunk will do that for me. (Frankly, this seems like the only remaining legitimate use for the outputcsv/inputcsv, everything should use outputlookup/inputlookup instead.)

However, while it appears that outputcsv honors dispatch=true, the inputcsv command ignores it. Here's the messages created when I run my search:

 Results written to file '/opt/splunk/var/run/splunk/dispatch/1522855828.301/cluster_output.csv' on serverName='...'
 File '/opt/splunk/var/run/splunk/csv/cluster_output.csv' could not be opened for reading.

Example use case:

This shouldn't really need an example, but to avoid the inevitable push back regarding why I would ever want to do this, I'll elaborate on the use case. (Yes, typically appendpipe removes the need for this type of stash/unstash operations, but in this case that's not an option.) I'm attempting to provide output similar to the "Patterns" tab on the Search view but with some additional information about event sizes. The findkeywords SPL command is labeled as "internal" and places some unusual expectations on the search pipeline, but it does very cool things that I really want, so I'm willing to put up with it's idiosyncrasies.

index=data sourcetype=mydata
| head 50000 
| eval bytes=len(_raw)
| cluster t=0.75 field=_raw showcount=true labelonly=true labelfield=groupID
| fields _time _raw bytes groupID cluster_count timestartpos timeendpos
| outputcsv dispatch=true cluster_output.csv
| findkeywords labelfield=groupID dedup=true
| inputcsv dispatch=true append=true cluster_output.csv
| stats sum(bytes) as total_bytes, values(*Keywords) as *Keywords, last(sampleEvent) by groupID

The above works if I abandon the "dispatch=true" and use this instead:

...
| outputcsv create_empty=true cluster_output.csv
| findkeywords labelfield=groupID dedup=true
| inputcsv append=true cluster_output.csv
....

However, that leaves my file open to be read by any Splunk users, there's no cleanup, and most importantly, it causes chaos if my search is run concurrently. (This search will eventually be run on a dashboard)

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...