Deployment Architecture

What is the best way to retrieve data from a remote site Splunk index?

cbse120109
Explorer

I will have two Splunk Servers, one called Central and the other Remote. Remote will have a 1 week retention and be used to index our hosted nodes and applications. I would like Central to connect to Remote daily to retrieve and store data for analysis and archiving. Central will only have ssh access to Remote.

Would having a scripted-input on Central that connects to Remote to force a hot to warm bucket roll and then copy this data back to Central work?

1 Solution

Lowell
Super Champion

Yeah, that seems like that could work. I would suggest using something like rsync. You should create one index on your central server for every index you have on the remote server. You have to so this or your bucket ids will cause collisions. And you have to make sure that your central server never writes to these remotely synced indexes. As long as that's the case, you should be able to have rsync sync all but the "hot" buckets from the remote server to the central server.

Of course using a forwarder instead does have many advantages, but in theory I don't see why this general approach wouldn't work for you.

As an example, you'd probably want something like this:

rsync -a --delete --rsh=ssh --exclude "/db/hot*[0-9]" splunk@Remote:/opt/splunk/var/lib/defaultdb /opt/splunk/var/lib/remote_defaultdb

That's just and idea to get you started; you should obviously verify all the options yourself.

Be sure to write back and let us know what you come up with.

View solution in original post

Lowell
Super Champion

Yeah, that seems like that could work. I would suggest using something like rsync. You should create one index on your central server for every index you have on the remote server. You have to so this or your bucket ids will cause collisions. And you have to make sure that your central server never writes to these remotely synced indexes. As long as that's the case, you should be able to have rsync sync all but the "hot" buckets from the remote server to the central server.

Of course using a forwarder instead does have many advantages, but in theory I don't see why this general approach wouldn't work for you.

As an example, you'd probably want something like this:

rsync -a --delete --rsh=ssh --exclude "/db/hot*[0-9]" splunk@Remote:/opt/splunk/var/lib/defaultdb /opt/splunk/var/lib/remote_defaultdb

That's just and idea to get you started; you should obviously verify all the options yourself.

Be sure to write back and let us know what you come up with.

View solution in original post

cbse120109
Explorer

Thanks, I finally found the answer button.

0 Karma

Lowell
Super Champion

BTW, you may want to think about setting "isReadOnly=true" in indexes.conf on the replicated indexes. (This should prevent any events accidentally ending up in your replicated index.) I just came across this option the other day and thought about your question. Glad you found a working solution. 😉

0 Karma

Lowell
Super Champion

To mark a question as answered, please click the check mark next to the answer that was the most helpful. Otherwise this site will consider this question unanswered.

0 Karma

cbse120109
Explorer

This question is answered

0 Karma

cbse120109
Explorer

So to resolve this issue I had to do the following. Configure Splunk Retention for MaxWarmDB = 1 and frozenTimeInSec = 604800. This allows only one warm db to exist and 7 day retention in cold.
I have a script that will force a hot to warm roll and then rsync the warmdb back to central.
Thanks to Lowell, i modified his example to work for what we need.
rsync -va --rsh=ssh --exclude "/db/hot*[0-9]"
root@$REMOTE:$SPLUNK/var/lib/splunk/defaultdb/
/$SPLUNK/var/lib/splunk/$CUSTOMDB

0 Karma

cbse120109
Explorer

Thanks for the example. I will create a script and test out rsync and report back soon.

0 Karma

hulahoop
Splunk Employee
Splunk Employee

You could have Remote store and forward data to Central, so Remote would act as both an Indexer and a Forwarder. However, you cannot do the forwarding on a batch basis. Forwarding happens in near realtime. You can, however, throttle the stream from the forwarder if you are worried about network bandwidth between Central and Remote.

Store and Forward is covered after Step 6 here: http://www.splunk.com/base/Documentation/latest/Admin/Enableforwardingandreceiving#Set_up_regular_fo...

0 Karma

cbse120109
Explorer

Thanks, the issue is that Central and Remote will be in different sites and the only connection between them needs to be established from Central via ssh over the internet.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.