Solved: When the compressed file is placed in the Universa...

human96 · ‎02-15-2022

As a Splunk behavior when you bring a compressed file into Splunk I think I'm uncompressing a compressed file.

When the compressed file is placed in the Universal Forwarder and Indexer (Heavy Forwarder) Is there any difference when importing a compressed file directly with?

Specifically, the speed of import processing and the disk or spec usage rate will differ. If so, please let me know.

PickleRick · ‎02-16-2022

As the splunk component monitoring the compressed file has to uncompress the file to ingest it, there is a performance penalty for such operation. How significant it is will of course depend on the amount of data.

But the most important thing is that, quoting the docs, "If you add new data to an existing archive file, the forwarder reprocesses the entire file rather than just the new data. This can result in event duplication. "

It's relatively understandable with archive formats like tar or zip, but even in case of simply gzipped file, which theoretically can be appended in compressed form (gzip is a stream compression algorithm) splunk will decompress whole file and read it anew if it notices any changes.

So ingesting compressed files could work reasonably only for files that are created as compressed and never touched again (like files compressed after rotation by logrotate).

Of course sometimes you don't have much choice if the source solution produces logs in a very unusual way but if you can, avoid ingesting compressed files.

View solution in original post

VatsalJagani · ‎02-16-2022

I would always do data onboarding on UF and never on Indexer. And that would be the best option.

PickleRick · ‎02-16-2022

Sorry, but it's not necessarily true. At least not in every possible situation.

For example - in order to reasonably ingest several different syslog sources, where you wouldn't use plain tcp/udp inputs on the UF because you loose too much metadata on them, you'd want an intermediate layer of SC4S or rsyslog solution. From there you'd send the events to HEC. And you can't set up HEC input on UF. In a small-scale environment you'd simply create a HEC input on your indexer, not set up a separate HF just to create a single HEC input.

Not to mention other inputs that require "heavy" component, like some modular inputs.

Sometimes UF is not enough or is not an option for other reasons but spinning up a separate HF would be a bit of an overkill.

isoutamo · ‎02-15-2022

Hi

Disclaimer: I haven't ever ingest compressed files on production, so this answer is based on what I have read.

as splunk need to unzip those and this process is done on one thread it't probably better to do this on UF side? Otherwise (especially if you have lot of those) you will generate lot of additional work for indexers.

r. Ismo

human96 · ‎02-16-2022

Understood, what about the speed of import processing and the disk or spec usage rate ? wil it differ ?
if yes , please let me know the details

PickleRick · ‎02-16-2022

As the splunk component monitoring the compressed file has to uncompress the file to ingest it, there is a performance penalty for such operation. How significant it is will of course depend on the amount of data.

But the most important thing is that, quoting the docs, "If you add new data to an existing archive file, the forwarder reprocesses the entire file rather than just the new data. This can result in event duplication. "

It's relatively understandable with archive formats like tar or zip, but even in case of simply gzipped file, which theoretically can be appended in compressed form (gzip is a stream compression algorithm) splunk will decompress whole file and read it anew if it notices any changes.

So ingesting compressed files could work reasonably only for files that are created as compressed and never touched again (like files compressed after rotation by logrotate).

Of course sometimes you don't have much choice if the source solution produces logs in a very unusual way but if you can, avoid ingesting compressed files.

isoutamo · ‎02-16-2022

I didn't get what you are meaning about this, but the main principle is read data directly where it has created. For that reason usually UF is better (IMHO). If you want to move it to indexers then I propose that you will uncompress it before store it into indexers sinkhole.

When the compressed file is placed in the Universal Forwarder and Indexer is there any difference when import?

configuration

other

using Splunk Enterprise

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms