Getting Data In
Highlighted

Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Contributor

We currently have some data that appears in "snapshot" form. In other words, we get a snapshot of the data every day from a restful interface and upload to Splunk.

To eliminate search issues for customers, we do a soft delete ( | delete - Not really deleted ) of the index before ingesting the new data. This has been problematic because we often deploy our new app and sometimes sources that were soft deleted return and there are many delete directories which form.

First of all, any suggestions on how we could handle this would be appreciated. (We've tried lookups and kv_lookups but failed because they didn't have the faceted search our customers wanted).

One remedy we've suggested is to remove all of the buckets in the index before ingesting the new data instead of "| delete".

Is there a good way to list all of the buckets for an index? (| dbinspect or index=test | eval bkt=bkt | table bkt splunkserver)??

Then, remove them? (reference below)
https://answers.splunk.com/answers/133845/delete-corrupt-bucket-or-down-index-in-cluster.html

Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Splunk Employee
Splunk Employee

Hi jaredlaney

Would it be possible to just tell your users to set the time picker to "Today". That way they would only see that data imported in the last batch. This is much easier.

You could even set "Today" as the default time range when your users login to Splunk.

j

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Contributor

Hi jbjerke. We suggested this previously but our users didn't love it. Maybe we could try it again.

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Contributor

Our snapshots are also indexed by time so choosing the time picker to Today would cut off much of the snapshot as the snapshots become intertwined over time.

Example
index = test
source1 = (1/28, 5), (1/29, 6)
source2 = (1/30, 7), (1/29, 6), (1/29, 5.5), (1/28, 5), (1/28, 5.4)

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Splunk Employee
Splunk Employee

When you run the delete command, it actually marks the data / buckets as unsearchable. So when you say you are seeing these results come back in searches, this sounds like a bit of problem. They shouldn't be returned in search results, if they are you might want to talk to support and see if there is a bug.

In terms of deleting the index, you can use dbinspect to find the buckets...

| dbinspect index=main | convert ctime(endEpoch) ctime(startEpoch) | table bucketId path startEpoch endEpoch

That will give you the location on disk and associated times with the data in the buckets. You could manually delete the buckets...

If you are in a clustered environment, you need to be careful the way you delete the indexes or buckets.. Best practices would be to put the CM into maintenance mode. From there you need to clean the index on each indexer, you have to stop Splunk first...

splunk clean eventdata -index <index_name>

After that you can restart the indexers and take the CM out of maintenance mode.

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Contributor

@esix - There is a bug in Splunk. It is SPL-100516. We haven't been able to get it fixed so we're looking for alternate ways to do it. We're all Splunk Certified Architects so we're aware of how to delete in a clustered environment. We just have a high SLA and bringing Splunk into maintenance mode isn't a great option for us. We've also stopped all the indexers and ran the "splunk clean eventdata" command.

We're kind of looking for a solution where we can freeze buckets without taking down the cluster.

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Splunk Employee
Splunk Employee

How about applying a restricted search term for the Role of the users.
Settings->Access Controls->Roles->YOURUSERROLE

If you add something like this they will only see what has been indexed in the last day regardless of the event time:

_index_earliest=-1d@d

This will happen in the background so your users would never know.

j

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Contributor

@jberke - Again, the snapshot has data from the past few months so putting in indexearliest=-1d@d wouldn't work.

Is there a indexlatest(source) command I could run by role?

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Contributor

See example:

Example
index = test
source1 = (1/28, 5), (1/29, 6)
source2 = (1/30, 7), (1/29, 6), (1/29, 5.5), (1/28, 5), (1/28, 5.4)

0 Karma
Highlighted

Re: Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

Splunk Employee
Splunk Employee

Hi jared

I think you are misunderstanding, indexearliest is not the same as earliest . By typing indexearliest=d@d you would show all data that was indexed during the day today regardless of event time - even if that data is many years old. The time picker would say "All time" but it would only show what has been indexed since this morning at 00:00.

j

0 Karma