Splunk Enterprise Security

Is there a way to re-index data from an index to a new index, on a indexer cluster, without using collect?

hettervik
Builder

We have an index with a ton of data. A new use for the data has emerged, so now we want a longer retention time on some of the data in the index. We don't want to simply increase the retention time on the index, because the storage cost is too high. We want to create a new index with a longer retention, pick out the events we need, and copy them to the new index. This is on an indexer cluster.

In theory, we could use collect, like this:

index=oldindex field=the_events_we_need
| collect index=newindex

However, because the index is too big, we're having problems running this search. Even though we run it bit-by-bit, still we end up missing events in the new index. Could be due to performance or memory limits, or bucket issues.

Is there a better and more reliable way of doing this?

Labels (1)
0 Karma

livehybrid
Ultra Champion

Hi @hettervik 

How much data are we talking here? Is this GB/TB? 

Ultimately the best approach to take depends on the amount of data you need to extract/re-index.

The collect approach might still be viable, but should be scripted to run smaller increments continuously until  you've extracted what you need. Alternatively you could take a similar approach to incrementally export blocks of the data using the Splunk REST API endpoints, see https://help.splunk.com/en/splunk-enterprise/search/search-manual/9.3/export-search-results/export-d... for more info - you can then re-ingest this using a UF/HF. 

:glowing_star: Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

0 Karma

tej57
Builder

Hello @hettervik,

From the scenario, it seems that collect is the only way to achieve your use case. You'll have to try filtering out the events you don't need and better optimize the SPL search and use the collect command so that you do not miss the required events.

However, if you want to migrate the buckets, I've found one of the older community posts that might help you - https://community.splunk.com/t5/Installation/Is-it-possible-to-migrate-indexed-buckets-to-a-differen.... But I would be quite cautious while trying this approach. Haven't tried it myself. But copying the buckets might bring unwanted data to the new index. You can test it out with one of the smaller buckets and test if you achieve the desired result or not.

IMO, collect is the best way to move forward. You can use the following SPL query to keep the original parsing configuration

index = old_index
| <<filter out the events required>>
| fields host source sourcetype _time _raw
| collect index=new_index output_format=hec

 

Thanks,
Tejas.

 

---
If the above solution helps, an upvote is appreciated.!!

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

There was same kind of discussion on slack side some times ago. Maybe this can leads you into correct way? https://splunkcommunity.slack.com/archives/CD9CL5WJ3/p1727111432487429

0 Karma
Get Updates on the Splunk Community!

The Payment Operations Wake-Up Call: Why Financial Institutions Can't Afford ...

The same scenario plays out across financial institutions daily. A payment system fails at 11:30 AM on a busy ...

Make Your Case: A Ready-to-Send Letter for Getting Approval to Attend .conf25

Hello Splunkers, Want to attend .conf25 in Boston this year but not sure how to convince your manager? We've ...

Community Spotlight: A Splunk Expert's Journey

In the world of data analytics, some journeys leave a lasting impact not only on the individual but on the ...