Splunk Enterprise Security

Is there a way to re-index data from an index to a new index, on a indexer cluster, without using collect?

hettervik
Builder

We have an index with a ton of data. A new use for the data has emerged, so now we want a longer retention time on some of the data in the index. We don't want to simply increase the retention time on the index, because the storage cost is too high. We want to create a new index with a longer retention, pick out the events we need, and copy them to the new index. This is on an indexer cluster.

In theory, we could use collect, like this:

index=oldindex field=the_events_we_need
| collect index=newindex

However, because the index is too big, we're having problems running this search. Even though we run it bit-by-bit, still we end up missing events in the new index. Could be due to performance or memory limits, or bucket issues.

Is there a better and more reliable way of doing this?

Labels (1)
0 Karma
1 Solution

hettervik
Builder

Thanks for your help and suggestions. We ended up using the collect method, as first presented. We could perhaps migrate or copy buckets on disk, but as we needed specific events from the index, that wouldn't work. Also, using some sort of scripts to automate the process seemed like too much work - there don't seem to be an easy way to do this in Splunk, and the scripts would have to keep track of if searches failed or is successfull as well, so it would be complicated to implement.

In the end, someone had to "manually" run the collect search, bit by bit, backwards on time, over the whole big index. Run the collect search on a timeslot, then if successfull, run it on the previous timeslot, and so on.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Just for the sake of completness and future reference - in splunk 10 there is a new functionality of "split index" but it only works for very specific set of use cases. More info - https://help.splunk.com/en/splunk-enterprise/administer/manage-indexers-and-indexer-clusters/10.0/ma...

0 Karma

hettervik
Builder

Thanks for your help and suggestions. We ended up using the collect method, as first presented. We could perhaps migrate or copy buckets on disk, but as we needed specific events from the index, that wouldn't work. Also, using some sort of scripts to automate the process seemed like too much work - there don't seem to be an easy way to do this in Splunk, and the scripts would have to keep track of if searches failed or is successfull as well, so it would be complicated to implement.

In the end, someone had to "manually" run the collect search, bit by bit, backwards on time, over the whole big index. Run the collect search on a timeslot, then if successfull, run it on the previous timeslot, and so on.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Please accept that answer as Solution, so later other users can see it.

PickleRick
SplunkTrust
SplunkTrust

If you wanted to move around whole buckets, you could do that with no problem. But apparently you want to get some fields from the original data and "extract" this part into another index. That you cannot do without searching and extracting those fields by Splunk.

The REST-based approach where you automate searching and ingestion (either by means of | collect or by reingesting via HEC (remember to use correct sourcetype so it's treated as stash data)) seems the most convenient way - that way you can search in small chunks so they don't overwhelm your environment.

One more thing. I'm not 100% sure how will Splunk behave around the edges of search time ranges (like "earliest=X latest=X+100" and "earliest=X+100 latest=X+200" - what will happen with the events received exactly at X+100). Either do some testing or just add a failsafe like "earliest=X+99 | where _time>X+100") to avoid duplications.

livehybrid
SplunkTrust
SplunkTrust

Hi @hettervik 

How much data are we talking here? Is this GB/TB? 

Ultimately the best approach to take depends on the amount of data you need to extract/re-index.

The collect approach might still be viable, but should be scripted to run smaller increments continuously until  you've extracted what you need. Alternatively you could take a similar approach to incrementally export blocks of the data using the Splunk REST API endpoints, see https://help.splunk.com/en/splunk-enterprise/search/search-manual/9.3/export-search-results/export-d... for more info - you can then re-ingest this using a UF/HF. 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

tej57
Builder

Hello @hettervik,

From the scenario, it seems that collect is the only way to achieve your use case. You'll have to try filtering out the events you don't need and better optimize the SPL search and use the collect command so that you do not miss the required events.

However, if you want to migrate the buckets, I've found one of the older community posts that might help you - https://community.splunk.com/t5/Installation/Is-it-possible-to-migrate-indexed-buckets-to-a-differen.... But I would be quite cautious while trying this approach. Haven't tried it myself. But copying the buckets might bring unwanted data to the new index. You can test it out with one of the smaller buckets and test if you achieve the desired result or not.

IMO, collect is the best way to move forward. You can use the following SPL query to keep the original parsing configuration

index = old_index
| <<filter out the events required>>
| fields host source sourcetype _time _raw
| collect index=new_index output_format=hec

 

Thanks,
Tejas.

 

---
If the above solution helps, an upvote is appreciated.!!

 

isoutamo
SplunkTrust
SplunkTrust

There was same kind of discussion on slack side some times ago. Maybe this can leads you into correct way? https://splunkcommunity.slack.com/archives/CD9CL5WJ3/p1727111432487429

First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...

What’s New in Splunk Observability – September 2025

What's NewWe are excited to announce the latest enhancements to Splunk Observability, designed to help ITOps ...