Is it possible to have a heavy forwarder send unparsed (not raw) cooked data?
I have a server which needs to forward data, and a universal forwarder sending compressed, unparsed data would be fine.
However, I would like to use that same server to do some data collection as well.
This data collection requires a full Splunk install and a 3rd party app (estreamer to be specific).
However, as I understanding it using a full Splunk install as a heavy forwarder will, by default send parsed data.
This is a much heavier network load, which I would like to avoid.
The only option in outputs.conf related to this is: sendCookedData = true | false.
If I set this to false, then it will be sending raw (uncooked data to the forwarder).
If I set this to true, then it appears the heavy forwarder will send all data as cooked, parsed data.
I'm looking for an option to send cooked, unparsed data.
Thanks for any help!
Hi folks. The easiest way to minimize network bandwidth impact from a HF is to:
1. Use the HF to monitor/interface with only the data sources that you need the HF for. I assume you want the HF for the UI needs, but some apps might use the parsing. This is what @somesoni2 is recommending.
2. Send the cooked, parsed output to the indexers via SSL, leveraging the SSL compression. This dramatically lowers the network impact, in exchange for admin overhead of setup and configuration.
One option would to have both Splunk Enterprise Instance and a Universal Forwarder on your machine generating data. Use Heavy forwarder only for estreamer specific monitoring and UF for rest. (you can't turn off parsing on HF, you may configure it to be reparsed at indexers).
Thanks Koshyk, but that didn't answer the question. I want the Heavy Forwarded to send unparsed cooked data to the indexers. You addressed sending raw and parsed.
You could have ANY number of outputs to any locations. So you send outputs to "Destination1 - Third Party" which is not cooked, but to "destination2 - xyz" which is cooked etc. Each tcpout stanza can vary depending on how you want it.
( by-the-way estreamer is a pain !! and I won't go near it as it is unsupported)
Also the heavy-forwarder gives option for OUTPUT in syslog format which is great to way to make Splunk work as a logging engine to centralised logging solutions. In one of the customers, we collect using Splunk UF from various machines and at Heavy-forwarder we dump to syslog server for multiple other third parties. We don't want to integrate , but rather we dump into central location in uncooked format and it is up-to the company/third-party how they take it.
Thanks Koshyk, but that didn't answer the question. I want the Heavy Forwarded to send unparsed cooked data to the indexers. You addressed sending raw and parsed.
I'm assuming you want to send from UF to a full Splunk installation.
If you look into the indexing piplelines , the UF does NOT do the real parsing (Detail Diagram - UF/LWF to Indexer). So the output from from ur UF will be "cooked but unparsed data"
No, I want to send from a Heavy Forwarder (because I need the full Splunk Installation for other purposes, like estreamer and dbconnect), but for normal file monitor functions (for instance) I want to forward cooked, unparsed data, in order to limit the network bandwidth. However, it seems as if the Heavy Forwarder can only be configured to send either raw, or cooked parsed data, which is much larger than unparsed data.
I realize this is a very old post, but as I was browsing I didn't see an answer to your questions. The best answer is probably to use a LightweightForwarders. Yes, I know it's supposedly deprecated, but it does pretty much what you want. That is, it sends cooked, but unparsed/unindexed data to the indexer. It also gives you all of the functionality such as Python, HEC inputs, etc. By default, the LightweightForwarder will disable the web interface, but that can be turned though the web.conf setting. I use this configuration in my DMZ.
Hi lbur, I'm having the same problem. Trying to send cooked and unparsed data from a Heavy Forwarder, so we don't have to re-distribute or re-plan data collection/ingestion, and I'm trying to avoid any additional configuration on the indexers' queues.
Have you gotten anywhere with this?