As a Splunk behavior when you bring a compressed file into Splunk I think I'm uncompressing a compressed file.
When the compressed file is placed in the Universal Forwarder and Indexer (Heavy Forwarder) Is there any difference when importing a compressed file directly with?
Specifically, the speed of import processing and the disk or spec usage rate will differ. If so, please let me know.
As the splunk component monitoring the compressed file has to uncompress the file to ingest it, there is a performance penalty for such operation. How significant it is will of course depend on the amount of data.
But the most important thing is that, quoting the docs, "If you add new data to an existing archive file, the forwarder reprocesses the entire file rather than just the new data. This can result in event duplication. "
It's relatively understandable with archive formats like tar or zip, but even in case of simply gzipped file, which theoretically can be appended in compressed form (gzip is a stream compression algorithm) splunk will decompress whole file and read it anew if it notices any changes.
So ingesting compressed files could work reasonably only for files that are created as compressed and never touched again (like files compressed after rotation by logrotate).
Of course sometimes you don't have much choice if the source solution produces logs in a very unusual way but if you can, avoid ingesting compressed files.
I would always do data onboarding on UF and never on Indexer. And that would be the best option.
Sorry, but it's not necessarily true. At least not in every possible situation.
For example - in order to reasonably ingest several different syslog sources, where you wouldn't use plain tcp/udp inputs on the UF because you loose too much metadata on them, you'd want an intermediate layer of SC4S or rsyslog solution. From there you'd send the events to HEC. And you can't set up HEC input on UF. In a small-scale environment you'd simply create a HEC input on your indexer, not set up a separate HF just to create a single HEC input.
Not to mention other inputs that require "heavy" component, like some modular inputs.
Sometimes UF is not enough or is not an option for other reasons but spinning up a separate HF would be a bit of an overkill.
Hi
Disclaimer: I haven't ever ingest compressed files on production, so this answer is based on what I have read.
as splunk need to unzip those and this process is done on one thread it't probably better to do this on UF side? Otherwise (especially if you have lot of those) you will generate lot of additional work for indexers.
r. Ismo
Understood, what about the speed of import processing and the disk or spec usage rate ? wil it differ ?
if yes , please let me know the details
As the splunk component monitoring the compressed file has to uncompress the file to ingest it, there is a performance penalty for such operation. How significant it is will of course depend on the amount of data.
But the most important thing is that, quoting the docs, "If you add new data to an existing archive file, the forwarder reprocesses the entire file rather than just the new data. This can result in event duplication. "
It's relatively understandable with archive formats like tar or zip, but even in case of simply gzipped file, which theoretically can be appended in compressed form (gzip is a stream compression algorithm) splunk will decompress whole file and read it anew if it notices any changes.
So ingesting compressed files could work reasonably only for files that are created as compressed and never touched again (like files compressed after rotation by logrotate).
Of course sometimes you don't have much choice if the source solution produces logs in a very unusual way but if you can, avoid ingesting compressed files.