I have a lot of instrumentation data from remote sites that I am collecting in Splunk. This data needs to go to the main index in the Enterprise, but I cannot stream data from remote sites through the firewall, etc. due to security concerns. I am proposing collecting with a Heavy Forwarder, but I need to create files from the collected data so it can be sent through the firewall, scanned, and then put in the main indexer farm. Should I collect it and index at the HF and then forward the files, or collect raw data in files and then send it through the firewall and then into the main indexer farm?
Thanks for any help. Not having much luck so far and I don't want to maintain multiple Splunk systems/dashboards, etc. if I don't have to.
The HF in front of the Firewall could index the data, let's say in index=intermediate. Then you could have scheduled searches, for example every five minutes, which would search the events from index=intermediate and write this search result without any processing locally on the HF to CSV files. You can use the outputcsv cmmand for that.The name of the csv file can be generated dynamically during search, so you could have individual csv file names for each output, e.g. containing the timestamp of the search, similar as you would have with rotating log files.
Finally you will have a steadily growing number of csv files locally on your HF, which could be passed through the firewall to the indexer, for eaxmple using scp or rsync or whatever makes sense (note that I am not the Linux expert 🙂 ). Or if you don't want to have those files on the indexer, you could move them from HF to any other Linux behind the filrewall and use a Universal Forwader from here. On the indexer, you you proces those CSV files as a usual file input.
I have currently no access to my Splunk, in case you need help with the dynamic CSV file names please get back, I am happy to help.
if you want to minimize the Splunk install footprint in your environment, use a log collector to create the raw files, then have a process that shovels them through the firewall (via push or pull depending on your topology and firewall rules). After this, just have Splunk read whatever data files land in the directory you drop these on the indexer. Splunk will then automagically suck it all in for you.