Archive

What causes these duplicate buckets (not marked duplicates)?

brianpreston
Path Finder

I'm trying to "freeze" these buckets, as described in Archive indexed data, and I'm seeing some confusing behavior from Splunk.

By "Freezing" buckets I mean:

  • the buckets are taken out of splunk and are no longer indexed
  • the bucket data is stored on a remote file system
  • the events in the buckets are not lost!
  • the events in the buckets are not stored more than once

I've set my frozen directory to /misc/cloud2/splunk/freezer/accumulating/, and buckets are being moved there. That is great!

However several buckets with the identical timestamp range are getting frozen there. Why?

The problem is, the buckets shown, in a single index, represent the same time span.

It seems very likely the data is being duplicated.

But the newer buckets are always bigger.

Questions

  • How are they newer if they are the same time span?
  • why is the same data being frozen more than once?
  • where is the new data coming from if this is the frozen directory?

More details

Note the increasing sequence numbers are associated with increasing journal.gz sizes.

Here's a sample set of 3 duplicate timestamp ranges:

splunk@nursery-splunkindex-1001:~$ ls -d /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399*
/misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1238
/misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1982
/misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_803

Here's the journals getting progressively larger:

splunk@nursery-splunkindex-1001:~$ ls -l /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399*/rawdata/journal.gz
-rw------- 1 splunk splunk 1287 Apr  3 18:53 /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1238/rawdata/journal.gz
-rw------- 1 splunk splunk 1566 Apr  3 18:59 /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1982/rawdata/journal.gz
-rw------- 1 splunk splunk  651 Apr  3 18:51 /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_803/rawdata/journal.gz
0 Karma

brianpreston
Path Finder

It was the Splunk App forAWS -- http://docs.splunk.com/Documentation/AddOns/latest/AWS/Description

  • What is happening here is the app is generating log events as the output of its analysis
  • the log events are given a timestamp corresponding to the timestamp of the event in the AWS system
  • the log event may be re-ingested! And therefore collide with another bucket

How to fix?

  • Fundamentally, do not put AWS or any other generated log events in the same index as your application data events
  • secondly, do not try to freeze or archive events generated by an app / plugin
0 Karma

ddrillic
Ultra Champion

Unfortunately, it isn't clear what you are trying to do and how. Can you please elaborate?

0 Karma

brianpreston
Path Finder

Sure. I'm trying to "freeze" these buckets, as described in Archive indexed data

In short this means

  • the buckets are taken out of splunk and are no longer indexed
  • the bucket data is stored on a remote file system
  • the events in the buckets are not lost!
  • the events in the buckets are not stored more than once

The problem is, the buckets shown, in a single index, represent the same time span.

It seems very likely the data is being duplicated.

But the newer buckets are always bigger.

  • How are they newer if they are the same time span?
  • why is the same data being frozen more than once?
  • where is the new data coming from if this is the frozen directory?
0 Karma