Deployment Architecture

How to split out cold data from hot/warm data and migrate it to a new location in my indexer cluster?

zthomas
Explorer

I am attempting to convert my indexer cluster to use volumes for storage instead of directly setting homePath and coldPath. Currently we have roughly 3.5 TB of data on each indexer in /splunk/data/hot. Currently, this includes all of the data in Splunk; hot, warm, cold, and frozen. I am trying to break out the cold and frozen data to use a new path (/splunk/data/cold), and I also want to manage both of these paths using volumes. I have set up the indexes.conf with the volume definitions, and modified one index to use the new volumes for it's homePath and coldPath.

The new index has created a folder in /splunk/data/cold for its cold data, but it has not moved any of the existing cold data to this folder. How can I force splunk to migrate the existing cold data from /splunk/data/hot to the /splunk/data/cold and manage this data using the newly created cold volume?

0 Karma

lguinn2
Legend

I disagree with the comments! You should not be manually moving buckets - IF I understand correctly.

  1. You have changed the bucket locations in indexes.conf
    For this to work properly, you must (a) stop Splunk (b) manually copy the folders from the old locations (/vol1/myindex/db for example) to the new locations (/newvol/myindex/db). You are moving entire directories. Then (c) start Splunk.

  2. You want stuff to be in the colddb folder and there is nothing there.
    Splunk automatically rolls buckets from the warm location (.../db) to the cold location (.../colddb) when one of the following conditions are met (these are set in indexes.conf):
    maxWarmDBCount = 300 # 300 is the default. When the number of warm buckets exceeds this count, the oldest bucket is rolled to cold
    homePath.maxDataSizeMB = 0 # 0 is the default, When this is set > 0, then when the hot+warm buckets consume this much disk space, the oldest bucket is rolled to cold.

So if you are using the defaults, then no rolling will occur until you have 300 warm buckets. If Splunk is filling one bucket per day in this index, warm will hold 10 months of data... You probably want to change one or both of these settings in indexes.conf for your index.

0 Karma

zthomas
Explorer

unfortunately neither of these scenarios apply to my situation. I have currently both hot and cold buckets mixed together in the same location, and I need to separate the existing cold buckets from the hot and move them to a new location.

0 Karma

dstonecypher_sp
Splunk Employee
Splunk Employee

2) in this answer is the situation you describe in the original post.

Cold buckets aren't "cold" by default. You have to tell Splunk when to roll them. Once you do, it will be automatically handled for you.

0 Karma

lguinn2
Legend

How do you know that they are cold buckets? What makes the bucket "cold"?

0 Karma

zthomas
Explorer

I'm referencing the splunk documentation here http://docs.splunk.com/Documentation/Splunk/6.2.0/Indexer/HowSplunkstoresindexes when I talk about hot/warm/cold buckets. I don't know how to know if they are cold buckets or not, that's part of my problem 🙂

0 Karma

gjanders
SplunkTrust
SplunkTrust

The difference between a warm and cold bucket is simply the location on disk, what you are likely seeing is the hot & warm buckets which you can tell the difference between by the naming convention (hot_ are hot buckets as per the bucket naming conventions

If your volume has not reached the limit yet, and your homePath.maxDataSizeMB has not reached it's limit for the specified index (defaults to 0) then Splunk has no reason to roll buckets to cold unless as per Iguinn's statement there are >300 warm buckets in the index.

I would suggest you lookup the db inspect command:

You can use that to determine which buckets are in which state on a per index basis...

zthomas
Explorer

Thank you, your comment helped me gather more information on my situation.

0 Karma

lguinn2
Legend

Yes, if there is no data in the colddb directory, then you have no cold buckets. Splunk has not encountered any criteria to make the buckets roll from warm to cold. Use one of the settings that I explained, and Splunk will start rolling buckets to cold when they meet the criteria.

0 Karma

zthomas
Explorer

I've looked a little closer at the current configuration (this is stuff I've inherited at my new position, so I'm still familiarizing myself with it) and I have more details.

I'll use actual file paths here to avoid confusing anybody.

I'm working on an index called os, which has homePath=/splunk/data/hot/os/db, and coldPath=/splunk/data/hot/os/colddb. Both of these directories have over 300 buckets in them. When I changed the indexes.conf to include the volumes:

[volume:hot_warm]
path = /splunk/data/hot
maxVolumeDataSizeMB = 3500000

[volume:cold]
path = /splunk/data/cold
maxVolumeDataSizeMB = 6500000

and to use homePath and coldPath as:

[os]
repFactor=auto
homePath = volume:hot_warm/os
coldPath = volume:cold/os
thawedPath = /splunk/data/thawed/os_thawed

Once these changes were made and the indexer was restarted, /splunk/data/cold/os was created as a directory, but nothing was ever moved, or added to it. I've now realized that /splunk/data/hot/os/colddb is what holds all of my cold data, but how do I get splunk to move all of that data to /splunk/data/cold/os?

0 Karma

somesoni2
Revered Legend

That would needs to be moved manually.

0 Karma

zthomas
Explorer

Ok, that's what I was afraid of. How would I go about identifying the data that needs to be moved (the cold buckets)?

0 Karma

somesoni2
Revered Legend

Could you post your older and new Indexes.conf setting?

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...