All Apps and Add-ons

Frozen archives into Amazon S3

Communicator

Has anyone got a sample coldToFrozenScript that will copy frozen index archives to S3 before erasing them?

Tags (3)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Explorer

Hi @Splunk_rocks,

Can you please tell me what your splunk setup looks like? What OS Splunk is installed on? How did you configure the indexer to run the script. What python packages you added and how?

But first try to use a file path with logfilepath.

0 Karma

Path Finder

Thanks for checkinh @Sbutto
mine is native Linux running
I have put your script under /opt/splunk/etc/apps/
I have configured in indexes. conf to
run python path/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
Can you please help me on this im stuck

0 Karma

Path Finder

As of now its just stand alone splunk running on single instance indexer

0 Karma

Path Finder

@Sbutto here is my inputs in the script

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"

ARCHIVEDIR = os.path.join(os.getenv('SPLUNKHOME'), 'frozenarchive')

scriptpath = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log
file_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/'

gnuhomedir = '' #where the gpg directory is. For example /home/s3/.gnupg/

gnuhomedir = ' /home/splunkqa/ '

reciepient_email = '' #the email the gpg uses to encrypt the files

reciepient_email = ' xyz@@@domain.com '

Enabling the logging system

logger = applyLogging.getmodulelogger(appname='SplunkArchive',filepath=logfilepath)

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

FYI - my splunk was setup standalone indexr host splunk is installed under /opt/splunk
my indexr is configured under /splunk/index

0 Karma

Path Finder

here is my error code

02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript File "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py", line 150
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript sys.stderr.write("mkdir warning: Directory '" + ARCHIVEDIR + "' already exists\n")
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript ^
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript SyntaxError: invalid syntax
02-09-2019 13:51:36.715 -0500 ERROR BucketMover - coldToFrozenScript cmd='"/bin/python" "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py" /splunk/index/splunk/noindexdb/db/db
1543106304154310544310' exited with non-zero status='exited with code 1

0 Karma

Explorer

I have developed this script coldToFrozenPlusS3Uplaod.py that encrypts and uploads frozen buckets to S3.

It can be found here: https://github.com/marboxvel/Encrypt-upload-archived-Splunk-buckets

0 Karma

Path Finder

Hey @Sbutto im using your /coldToFrozenPlusS3Uplaod.py
to upload to S3 but getting issues can any one help me
here is the attributes i have added

import sys, os, gzip, shutil, subprocess, random, gnupg
import boto
import datetime
import time
import tarfile

applyLogging is a python script named applyLogging.py that exists at the same level of this script.

If the file applyLogging.py doesn't exist where this file is located, the import statement will fail.

sys.path.append(script_path)
import applyLogging

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"

ARCHIVEDIR = os.path.join(os.getenv('SPLUNKHOME'), 'frozenarchive')

scriptpath = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log
file_path = '/opt/splunk/var/log/splunk/'

gnuhomedir = '' #where the gpg directory is. For example /home/s3/.gnupg/

gnuhomedir = /home/splunkq/.gnupg

reciepient_email = '' #the email the gpg uses to encrypt the files

reciepient_email = xxyxy@gmail.com

Enabling the logging system

logger = applyLogging.getmodulelogger(appname='SplunkArchive',filepath=logfilepath)

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

First we need to find today's epoch

today=round(time.mktime(datetime.datetime.today().timetuple()))

Substract 120 days

onemonthearlier=today-120*86400

logger.info('Started on '+str(datetime.datetime.today()))

Getting the hostname so we can prefix the uploaded file name with it to distinguish buckets from different indexes.

hostname=os.uname()[1]

S3 creds

AWSACCESSKEYID="xxxx"
AWS
ACCESSKEYSECRET="xxxx"
AWSBUCKETNAME="s3://zfu-splunk-pa/"

Creating the gpg object

gpg = gnupg.GPG(gnupghome=gnuhomedir)

0 Karma

Contributor

hey @marksnelling - a bit late but here is our sample script.

https://bitbucket.org/asecurityteam/atlassian-add-on-cold-to-frozen-s3/overview

there are a few assumptions, like IAM user keys or roles deployed to your nodes... but i've tested successfully across a large index cluster.

0 Karma

Splunk Employee
Splunk Employee

Nice script. On a side note, you might look at awscli instead of s3cmd. Its an officially supported binary, along with being multithreaded (better performance!)

0 Karma

Contributor

sure, anything is possible. but i would be more interested to see Splunk and AWS come together to make something more legit than what we've hacked together in 30 minutes.

0 Karma

SplunkTrust
SplunkTrust

Contributor

I downvoted this post because this approach is no longer officially supported, and has too many dependencies attached (java, etc).

0 Karma

Splunk Employee
Splunk Employee

For more info on Shuttl setup see: See: https://github.com/splunk/splunk-shuttl/wiki/Quickstart-Guide

0 Karma

Splunk Employee
Splunk Employee

Shuttl is deprecated and not in development any more, and I believe it wont work with > 6.2 due to python library incompatibilities.

At this point, you'd be better rolling to s3 with s3cmd or s3cli, as a script. In the future, perhaps there will be more functionality to include this as a roll to cold / frozen feature..

0 Karma

Splunk Employee
Splunk Employee

Hadoop does not need to be installed on the Splunk Indexer. If the data is in S3, then you can use the standard ways of deploying Hadoop to operate on the data there. See a discussion here: http://stackoverflow.com/questions/4092852/i-cant-get-hadoop-to-start-using-amazon-ec2-s3

Also keep in mind that if you want to use the data in Hadoop, you will want to archive in CSV format. If you want the data to come back to Splunk, you can bring the CSV data back (however, it may incur compute load on import), or for more efficient index restoration, store in Splunk Bucket format.

0 Karma

Communicator

This looks promising, I'm not sure how this is deployed though. Do I install Hadoop on my Splunk indexer and map it to S3 or does it need to be installed in EC2 and access S3 that way?
I'm assuming Hadoop is required for S3 BTW.

0 Karma