Might have a look at Shuttl -- http://blogs.splunk.com/2012/07/02/shuttl-for-big-data-archiving/
Hi @Splunk_rocks,
Can you please tell me what your splunk setup looks like? What OS Splunk is installed on? How did you configure the indexer to run the script. What python packages you added and how?
But first try to use a file path with log_file_path.
Thanks for checkinh @Sbutto
mine is native Linux running
I have put your script under /opt/splunk/etc/apps/
I have configured in indexes. conf to
run python path/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
Can you please help me on this im stuck
As of now its just stand alone splunk running on single instance indexer
@sbutto here is my inputs in the script
ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"
script_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log_file_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/'
gnu_home_dir = ' /home/splunkqa/ '
reciepient_email = ' xyz@@@domain.com '
logger = applyLogging.get_module_logger(app_name='SplunkArchive',file_path=log_file_path)
FYI - my splunk was setup standalone indexr host splunk is installed under /opt/splunk
my indexr is configured under /splunk/index
here is my error code
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript File "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py", line 150
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript sys.stderr.write("mkdir warning: Directory '" + ARCHIVE_DIR + "' already exists\n")
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript ^
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript SyntaxError: invalid syntax
02-09-2019 13:51:36.715 -0500 ERROR BucketMover - coldToFrozenScript cmd='"/bin/python" "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py" /splunk/index/splunk/noindexdb/db/db_1543106304_1543105443_10' exited with non-zero status='exited with code 1
I have developed this script coldToFrozenPlusS3Uplaod.py that encrypts and uploads frozen buckets to S3.
It can be found here: https://github.com/marboxvel/Encrypt-upload-archived-Splunk-buckets
Hey @sbutto im using your /coldToFrozenPlusS3Uplaod.py
to upload to S3 but getting issues can any one help me
here is the attributes i have added
import sys, os, gzip, shutil, subprocess, random, gnupg
import boto
import datetime
import time
import tarfile
sys.path.append(script_path)
import applyLogging
ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"
script_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log_file_path = '/opt/splunk/var/log/splunk/'
gnu_home_dir = /home/splunkq/.gnupg
reciepient_email = xxyxy@gmail.com
logger = applyLogging.get_module_logger(app_name='SplunkArchive',file_path=log_file_path)
today=round(time.mktime(datetime.datetime.today().timetuple()))
one_month_earlier=today-120*86400
logger.info('Started on '+str(datetime.datetime.today()))
hostname=os.uname()[1]
AWS_ACCESS_KEY_ID="xxxx"
AWS_ACCESS_KEY_SECRET="xxxx"
AWS_BUCKET_NAME="s3://zfu-splunk-pa/"
gpg = gnupg.GPG(gnupghome=gnu_home_dir)
hey @marksnelling - a bit late but here is our sample script.
https://bitbucket.org/asecurityteam/atlassian-add-on-cold-to-frozen-s3/overview
there are a few assumptions, like IAM user keys or roles deployed to your nodes... but i've tested successfully across a large index cluster.
Nice script. On a side note, you might look at awscli instead of s3cmd. Its an officially supported binary, along with being multithreaded (better performance!)
sure, anything is possible. but i would be more interested to see Splunk and AWS come together to make something more legit than what we've hacked together in 30 minutes.
Might have a look at Shuttl -- http://blogs.splunk.com/2012/07/02/shuttl-for-big-data-archiving/
I downvoted this post because this approach is no longer officially supported, and has too many dependencies attached (java, etc).
For more info on Shuttl setup see: See: https://github.com/splunk/splunk-shuttl/wiki/Quickstart-Guide
Shuttl is deprecated and not in development any more, and I believe it wont work with > 6.2 due to python library incompatibilities.
At this point, you'd be better rolling to s3 with s3cmd or s3cli, as a script. In the future, perhaps there will be more functionality to include this as a roll to cold / frozen feature..
Hadoop does not need to be installed on the Splunk Indexer. If the data is in S3, then you can use the standard ways of deploying Hadoop to operate on the data there. See a discussion here: http://stackoverflow.com/questions/4092852/i-cant-get-hadoop-to-start-using-amazon-ec2-s3
Also keep in mind that if you want to use the data in Hadoop, you will want to archive in CSV format. If you want the data to come back to Splunk, you can bring the CSV data back (however, it may incur compute load on import), or for more efficient index restoration, store in Splunk Bucket format.
This looks promising, I'm not sure how this is deployed though. Do I install Hadoop on my Splunk indexer and map it to S3 or does it need to be installed in EC2 and access S3 that way?
I'm assuming Hadoop is required for S3 BTW.