So I have been presented with a weird thought experiment to use Hunk as a primary index for select data and don't see any documentation that specifically indicates the answer, but it's an interesting idea.
Is it possible to forward data directly to a virtual index (Hunk) without touching a Splunk index. If so, is there a limitation or licensing concern?
The use case would be as follows:
With UF -
1: Grab netflow data from a sensor & store that in Splunk (time based log data)
2: Grab pcaps that appear in a directory & send that to the virtual index (binary files)
3: grab event definitions & store that in the virtual index (static data to be referenced when the events are pulled up in Splunk)
So the thought is that items 2 & 3 would not be indexed in Splunk because splunk either can't (binary files) index them, or that Splunk isn't the proper place (large tables of event code descriptions & definitions that are not related to time).
So, can it be done for items 2 & 3, and does it count against daily license usage?
2) The forwarder designed for generic data movement. Even in a future scenario, I don't see this as a use case we're likely to pursue.
3) Isn't this a better case for lookups?
In general, we are interested potentially in doing data delivery to Hadoop independent of a Splunk indexer, but we have no final plans or commitments. If you're interested, certainly willing to talk to you offline about potential productization options.
There is a setup video on this page to setup Hunk with HDP 1.1.3 (all the way at the bottom):
The configurations to setup Hunk with HDP 2.1 is slightly different (due to Yarn). Here is my configurations based on HDP 2.1 Sandbox:
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.description = Hunk 62 Hortonworks 2.1 Provider
vix.env.HADOOP_HOME = /usr
vix.env.JAVA_HOME = /usr
vix.family = hadoop
vix.fs.default.name = hdfs://sandbox:8020
vix.mapreduce.framework.name = yarn
vix.splunk.home.hdfs = /user/root/hunk62mr
vix.yarn.resourcemanager.address = sandbox:8050
vix.yarn.resourcemanager.scheduler.address = sandbox:8030
Thanks for the input, this is what I needed to understand.
I didn't think that UF was going to be supported as a generic forwarder, but I wasn't sure exactly. It would be interesting to see Splunk build towards that capability.
The case behind the lookup tables is that we want the data to be utilized outside of Splunk and we can forsee the datasets getting pretty large. Since Splunk only has lookups or time-keyed indexes, it makes it hard to build & scale multi-gig static datasets to use as a reference, hence the Hadoop inquiry.
I would love to talk about the options available as we are currently exploring what options exist.
So you want to use Hunk to write a search and then output the data back to Hadoop?
Love to chat more. For logistical purposes, it's easiest probably if you work with your sales rep. Tell them you spoke to me (Clint Sharp, Director of Product Management) and that we've mutually agreed to have a discussion about product roadmap and these features.