Splunk Enterprise Security

Is there a way to re-index data from an index to a new index, on a indexer cluster, without using collect?

hettervik
Builder

We have an index with a ton of data. A new use for the data has emerged, so now we want a longer retention time on some of the data in the index. We don't want to simply increase the retention time on the index, because the storage cost is too high. We want to create a new index with a longer retention, pick out the events we need, and copy them to the new index. This is on an indexer cluster.

In theory, we could use collect, like this:

index=oldindex field=the_events_we_need
| collect index=newindex

However, because the index is too big, we're having problems running this search. Even though we run it bit-by-bit, still we end up missing events in the new index. Could be due to performance or memory limits, or bucket issues.

Is there a better and more reliable way of doing this?

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

If you wanted to move around whole buckets, you could do that with no problem. But apparently you want to get some fields from the original data and "extract" this part into another index. That you cannot do without searching and extracting those fields by Splunk.

The REST-based approach where you automate searching and ingestion (either by means of | collect or by reingesting via HEC (remember to use correct sourcetype so it's treated as stash data)) seems the most convenient way - that way you can search in small chunks so they don't overwhelm your environment.

One more thing. I'm not 100% sure how will Splunk behave around the edges of search time ranges (like "earliest=X latest=X+100" and "earliest=X+100 latest=X+200" - what will happen with the events received exactly at X+100). Either do some testing or just add a failsafe like "earliest=X+99 | where _time>X+100") to avoid duplications.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @hettervik 

How much data are we talking here? Is this GB/TB? 

Ultimately the best approach to take depends on the amount of data you need to extract/re-index.

The collect approach might still be viable, but should be scripted to run smaller increments continuously until  you've extracted what you need. Alternatively you could take a similar approach to incrementally export blocks of the data using the Splunk REST API endpoints, see https://help.splunk.com/en/splunk-enterprise/search/search-manual/9.3/export-search-results/export-d... for more info - you can then re-ingest this using a UF/HF. 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

0 Karma

tej57
Builder

Hello @hettervik,

From the scenario, it seems that collect is the only way to achieve your use case. You'll have to try filtering out the events you don't need and better optimize the SPL search and use the collect command so that you do not miss the required events.

However, if you want to migrate the buckets, I've found one of the older community posts that might help you - https://community.splunk.com/t5/Installation/Is-it-possible-to-migrate-indexed-buckets-to-a-differen.... But I would be quite cautious while trying this approach. Haven't tried it myself. But copying the buckets might bring unwanted data to the new index. You can test it out with one of the smaller buckets and test if you achieve the desired result or not.

IMO, collect is the best way to move forward. You can use the following SPL query to keep the original parsing configuration

index = old_index
| <<filter out the events required>>
| fields host source sourcetype _time _raw
| collect index=new_index output_format=hec

 

Thanks,
Tejas.

 

---
If the above solution helps, an upvote is appreciated.!!

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

There was same kind of discussion on slack side some times ago. Maybe this can leads you into correct way? https://splunkcommunity.slack.com/archives/CD9CL5WJ3/p1727111432487429

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...