Getting Data In

Can we keep the frozen directory on one site out of two for the archiving period needed?

danielbb
Motivator

We are with multi-site of two sites.

Assuming we have -

site_replication_factor = origin:1,total:2
site_search_factor = origin:1,total:2

We need to archive the data for three years.

For backup purposes, is it possible to keep the frozen directory on one site for three years while the frozen directory on the indexers of the other site, has shorter retention period?

I assume this set-up would guarantee a full copy of all the data for three years.

Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @danielbb,
sorry but I don't understand your question: are you speking of frozen dir or cold dir?

Cold dir is always aligned between the two sites and you cannot set a different retention between sites.

Instead frozen dir is used to access buckets that exceed the retention period if you saved them in off-line.
In other words: you can configure your Splunk to delete buckets after the retention period or to put them off-line copying them in a different location (out of replication) using a script.
If you put them off-line, you can reuse a bucket copying it in the frozer dir, and it isn't replicated between sites.

I hope to be quite clear, anyway you can find infos at https://docs.splunk.com/Documentation/Splunk/8.0.0/Indexer/Automatearchiving

Ciao.
Giuseppe

danielbb
Motivator

Great Giuseppe. Let me step back. For the cyber indexes we use the following stanza -

[<index_name>]
coldPath = $SPLUNK_DB/<index_name>/colddb
homePath = $SPLUNK_DB/<index_name>/db
thawedPath = $SPLUNK_DB/<index_name>/thaweddb
frozenTimePeriodInSecs = 5184000
maxDataSize = auto_high_volume
enableTsidxReduction = true
timePeriodInSecBeforeTsidxReduction = 2592000
coldToFrozenDir=<location>/splunk/frozen/<index_name>
repFactor=auto

Meaning, after 60 days (5184000 seconds), data is rolled to frozen. Right? so, all the buckets will reside here indefinitely, outside the jurisdiction of Splunk, right?

So, for our requirement of archiving of three years - can we keep these buckets in the frozen directory for three years only on one site?

Meaning, on one site we'll archive and on the other site we'll delete when the buckets reach frozen.

And you are absolutely right - I changed the question's title from retention to archiving - thank you.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @danielbb,
this means that you have to create a post frozen script that, after rotation period, copy off-line buckets in the storage folder,

coldToFrozenScript = <path to script interpreter> <path to script>
* Specifies a script to run when data is to leave the splunk index system.
  * Essentially, this implements any archival tasks before the data is
    deleted out of its default location.
* Add "$DIR" (including quotes) to this setting on Windows (see below
  for details).
* Script Requirements:
  * The script must accept one argument:
    * An absolute path to the bucket directory to archive.
  * Your script should work reliably.
    * If your script returns success (0), Splunk completes deleting
      the directory from the managed index location.
    * If your script return failure (non-zero), Splunk leaves the bucket
      in the index, and tries calling your script again several minutes later.
    * If your script continues to return failure, this will eventually cause
      the index to grow to maximum configured size, or fill the disk.
  * Your script should complete in a reasonable amount of time.
    * If the script stalls indefinitely, it will occupy slots.
    * This script should not run for long as it would occupy
      resources which will affect indexing.
* If the string $DIR is present in this setting, it will be expanded to the
  absolute path to the directory.
* If $DIR is not present, the directory will be added to the end of the
  invocation line of the script.
  * This is important for Windows.
    * For historical reasons, the entire string is broken up by
      shell-pattern expansion rules.
    * Since Windows paths frequently include spaces, and the Windows shell
      breaks on space, the quotes are needed for the script to understand
      the directory.
* If your script can be run directly on your platform, you can specify just
  the script.
  * Examples of this are:
    * .bat and .cmd files on Windows
    * scripts set executable on UNIX with a ! shebang line pointing to a
      valid interpreter.
* You can also specify an explicit path to an interpreter and the script.
    * Example:  /path/to/my/installation/of/python.exe path/to/my/script.py
* Splunk software ships with an example archiving script in that you SHOULD
  NOT USE $SPLUNK_HOME/bin called coldToFrozenExample.py
  * DO NOT USE the example for production use, because:
    * 1 - It will be overwritten on upgrade.
    * 2 - You should be implementing whatever requirements you need in a
          script of your creation. If you have no such requirements, use
          'coldToFrozenDir'
* Example configuration:
  * If you create a script in bin/ called our_archival_script.py, you could use:
    UNIX:
        coldToFrozenScript = "$SPLUNK_HOME/bin/python" \
          "$SPLUNK_HOME/bin/our_archival_script.py"
    Windows:
        coldToFrozenScript = "$SPLUNK_HOME/bin/python" \
          "$SPLUNK_HOME/bin/our_archival_script.py" "$DIR"
* The example script handles data created by different versions of Splunk
  differently. Specifically, data from before version 4.2 and after version 4.2
  are handled differently. See "Freezing and Thawing" below:
* The script must be in $SPLUNK_HOME/bin or a subdirectory thereof.
* No default.

more infos at https://docs.splunk.com/Documentation/Splunk/8.0.0/Indexer/Automatearchiving

Ciao.
Giuseppe

Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...