The HF in front of the Firewall could index the data, let's say in index=intermediate. Then you could have scheduled searches, for example every five minutes, which would search the events from index=intermediate and write this search result without any processing locally on the HF to CSV files. You can use the outputcsv cmmand for that.The name of the csv file can be generated dynamically during search, so you could have individual csv file names for each output, e.g. containing the timestamp of the search, similar as you would have with rotating log files.
Finally you will have a steadily growing number of csv files locally on your HF, which could be passed through the firewall to the indexer, for eaxmple using scp or rsync or whatever makes sense (note that I am not the Linux expert 🙂 ). Or if you don't want to have those files on the indexer, you could move them from HF to any other Linux behind the filrewall and use a Universal Forwader from here. On the indexer, you you proces those CSV files as a usual file input.
I have currently no access to my Splunk, in case you need help with the dynamic CSV file names please get back, I am happy to help.
... View more