Splunk Enterprise

When the compressed file is placed in the Universal Forwarder and Indexer is there any difference when import?

human96
Communicator

As a Splunk behavior when you bring a compressed file into Splunk I think I'm uncompressing a compressed file.

When the compressed file is placed in the Universal Forwarder and Indexer (Heavy Forwarder) Is there any difference when importing a compressed file directly with?

Specifically, the speed of import processing and the disk or spec usage rate will differ. If so, please let me know.

 

Labels (3)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

As the splunk component monitoring the compressed file has to uncompress the file to ingest it, there is a performance penalty for such operation. How significant it is will of course depend on the amount of data.

But the most important thing is that, quoting the docs, "If you add new data to an existing archive file, the forwarder reprocesses the entire file rather than just the new data. This can result in event duplication. "

It's relatively understandable with archive formats like tar or zip, but even in case of simply gzipped file, which theoretically can be appended in compressed form (gzip is a stream compression algorithm) splunk will decompress whole file and read it anew if it notices any changes.

So ingesting compressed files could work reasonably only for files that are created as compressed and never touched again (like files compressed after rotation by logrotate).

Of course sometimes you don't have much choice if the source solution produces logs in a very unusual way but if you can, avoid ingesting compressed files.

View solution in original post

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

I would always do data onboarding on UF and never on Indexer. And that would be the best option.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Sorry, but it's not necessarily true. At least not in every possible situation.

For example - in order to reasonably ingest several different syslog sources, where you wouldn't use plain tcp/udp inputs on the UF because you loose too much metadata on them, you'd want an intermediate layer of SC4S or rsyslog solution. From there you'd send the events to HEC. And you can't set up HEC input on UF. In a small-scale environment you'd simply create a HEC input on your indexer, not set up a separate HF just to create a single HEC input.

Not to mention other inputs that require "heavy" component, like some modular inputs.

Sometimes UF is not enough or is not an option for other reasons but spinning up a separate HF would be a bit of an overkill.

isoutamo
SplunkTrust
SplunkTrust

Hi

Disclaimer: I haven't ever ingest compressed files on production, so this answer is based on what I have read.

as splunk need to unzip those and this process is done on one thread it't probably better to do this on UF side? Otherwise (especially if you have lot of those) you will generate lot of additional work for indexers.

r. Ismo

0 Karma

human96
Communicator

Understood, what about the speed of import processing and the disk or spec usage rate ? wil it differ ?
if yes , please let me know the details

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As the splunk component monitoring the compressed file has to uncompress the file to ingest it, there is a performance penalty for such operation. How significant it is will of course depend on the amount of data.

But the most important thing is that, quoting the docs, "If you add new data to an existing archive file, the forwarder reprocesses the entire file rather than just the new data. This can result in event duplication. "

It's relatively understandable with archive formats like tar or zip, but even in case of simply gzipped file, which theoretically can be appended in compressed form (gzip is a stream compression algorithm) splunk will decompress whole file and read it anew if it notices any changes.

So ingesting compressed files could work reasonably only for files that are created as compressed and never touched again (like files compressed after rotation by logrotate).

Of course sometimes you don't have much choice if the source solution produces logs in a very unusual way but if you can, avoid ingesting compressed files.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
I didn't get what you are meaning about this, but the main principle is read data directly where it has created. For that reason usually UF is better (IMHO). If you want to move it to indexers then I propose that you will uncompress it before store it into indexers sinkhole.
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...