Getting Data In

Does inputcsv honor dispatch=true?

Lowell
Super Champion

I'm am unable to get inputcsv to read from the dispatch (search job-specific directory). Does anyone know if this is a bug or if there are any workarounds?

I need to stash my search results temporarily during the execution of a search. The docs that if you use the dispatch=true parameter to outputcsv and inputcsv commands they will use the job-specific dispatch folder instead of from a shared system folder. This sounds perfect, because I don't want to deal with concurrency, user access, or cleanup issues if Splunk will do that for me. (Frankly, this seems like the only remaining legitimate use for the outputcsv/inputcsv, everything should use outputlookup/inputlookup instead.)

However, while it appears that outputcsv honors dispatch=true, the inputcsv command ignores it. Here's the messages created when I run my search:

 Results written to file '/opt/splunk/var/run/splunk/dispatch/1522855828.301/cluster_output.csv' on serverName='...'
 File '/opt/splunk/var/run/splunk/csv/cluster_output.csv' could not be opened for reading.

Example use case:

This shouldn't really need an example, but to avoid the inevitable push back regarding why I would ever want to do this, I'll elaborate on the use case. (Yes, typically appendpipe removes the need for this type of stash/unstash operations, but in this case that's not an option.) I'm attempting to provide output similar to the "Patterns" tab on the Search view but with some additional information about event sizes. The findkeywords SPL command is labeled as "internal" and places some unusual expectations on the search pipeline, but it does very cool things that I really want, so I'm willing to put up with it's idiosyncrasies.

index=data sourcetype=mydata
| head 50000 
| eval bytes=len(_raw)
| cluster t=0.75 field=_raw showcount=true labelonly=true labelfield=groupID
| fields _time _raw bytes groupID cluster_count timestartpos timeendpos
| outputcsv dispatch=true cluster_output.csv
| findkeywords labelfield=groupID dedup=true
| inputcsv dispatch=true append=true cluster_output.csv
| stats sum(bytes) as total_bytes, values(*Keywords) as *Keywords, last(sampleEvent) by groupID

The above works if I abandon the "dispatch=true" and use this instead:

...
| outputcsv create_empty=true cluster_output.csv
| findkeywords labelfield=groupID dedup=true
| inputcsv append=true cluster_output.csv
....

However, that leaves my file open to be read by any Splunk users, there's no cleanup, and most importantly, it causes chaos if my search is run concurrently. (This search will eventually be run on a dashboard)

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...