Monitoring Splunk

How to troubleshoot when coldtofrozen stops working?

mikefg
Communicator

My coldtofrozen has stopped working. Might be related to python3, but I'm not 100% sure. I've done some tweaking to the coldtofrozen.py #! /opt/splunk/bin python and I've checked other settings, but all seem to be okay.

Are there any commands or tools I can run to help troubleshoot?
Where would the errors be logged?

Thanks

Labels (1)
Tags (3)
0 Karma
1 Solution

mikefg
Communicator

I got it working again. I had to recopy the coldtofrozenexample.py and edit for my environment. I was missing some sections from the script.

View solution in original post

isoutamo
SplunkTrust
SplunkTrust

Hi

it's hard to say what is wrong as we cannot see your script.

Are you update splunk and change python to version 3 or how it have gone broken?

How you are calling it?

Basically you could test it with command under splunk user

splunk cmd <your script>

This runs it like inside splunk as it runs when it normally works.

If this  didn't give you any error then just add debug to your script and try again.

r. Ismo

0 Karma

mikefg
Communicator

Thanks for replying.

I had been running splunk 8.2 with it working fine and then upgraded to 9.0.3 on CentOS 7. I'm pretty sure 9.0.3 had been working fine as well, but I may be mistaken on the timing.

I did have to do a server name change on the indexers, but the frozen server stayed the same. My indexers write to a frozen server over NFS and nfs exports uses ip address, not name.

My coldtofrozen.py script hasn't changed, but when I run it this is what I get.

[bin]# ./splunk cmd "/opt/splunk/bin/python" "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py"
usage: python coldToFrozenExample.py <bucket_dir_to_archive>
[bin]#

Full script, edited to remove comments and internal server names.

#!/opt/splunk/bin python

import sys, os, gzip, shutil, subprocess, random, socket
hname = socket.gethostname()

ARCHIVE_DIR = os.path.join('/mnt/nfs/frozensrv', hname)

def handleNewBucket(base, files):
   print('Archiving bucket: ' + base)
   for f in files:
      full = os.path.join(base, f)
      if os.path.isfile(full):
         os.remove(full)

if __name__ == "__main__":
   if len(sys.argv) != 2:
      sys.exit('usage: python coldToFrozenExample.py <bucket_dir_to_archive>')

   if not os.path.isdir(ARCHIVE_DIR):
         try:
            os.mkdir(ARCHIVE_DIR)
         except OSError:
   # Ignore already exists errors, another concurrent invokation may have already created this dir
   sys.stderr.write("mkdir warning: Directory '" + ARCHIVE_DIR + "' already exists\n")

   bucket = sys.argv[1]
   if not os.path.isdir(bucket):
      sys.exit('Given bucket is not a valid directory: ' + bucket)

   rawdatadir = os.path.join(bucket, 'rawdata')
   if not os.path.isdir(rawdatadir):
         sys.exit('No rawdata directory, given bucket is likely invalid: ' + bucket)

   files = os.listdir(bucket)
   journal = os.path.join(rawdatadir, 'journal.gz')
   if os.path.isfile(journal):
         handleNewBucket(bucket, files)
   else:
         handleOldBucket(bucket, files)

   if bucket.endswith('/'):
         bucket = bucket[:-1]

   indexname = os.path.basename(os.path.dirname(os.path.dirname(bucket)))
   destdir = os.path.join(ARCHIVE_DIR, indexname, os.path.basename(bucket))

   while os.path.isdir(destdir):
         print('Warning: This bucket already exists in the archive directory')
         print('Adding a random extension to this directory...')
         destdir += '.' + str(random.randrange(10))

   shutil.copytree(bucket, destdir)

0 Karma

isoutamo
SplunkTrust
SplunkTrust

When you are running this from cmd line, you must give all needed parameters for it. Check from your script what those are, but if I recall right at least the bucket me it’s path is needed.

0 Karma

mikefg
Communicator

I'm passing a bucket path and getting this error now. Replaced some path values to hide internal names and bucket name.

bash-4.2$ /opt/splunk/bin/splunk cmd "/opt/splunk/bin/python" "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py" "/opt/splunk/var/lib/splunk/indexname/db/db_string"

Traceback (most recent call last):
   File "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py", line 51, in <module>
      handleOldBucket(bucket, files)
NameError: name 'handleOldBucket' is not defined

0 Karma

isoutamo
SplunkTrust
SplunkTrust
As we don’t know content of your script, we cannot really help you more. I propose that you try to find someone who knows enough Python and you look together what’s wrong in script and fix it.
0 Karma

mikefg
Communicator

I posted my script in this thread and it's the sample script, just edited for my environment. It's been working fine for a long time until this issue arose.

0 Karma

mikefg
Communicator

I got it working again. I had to recopy the coldtofrozenexample.py and edit for my environment. I was missing some sections from the script.

mikefg
Communicator

It looks like this line

    if len(sys.argv) != 2:

is where the script runs into trouble. I've also tried

   if len(sys.argv) < 2:

but same results. Not sure what values I need to pass or where to pass them to get the script to proceed. Maybe something in indexes.conf on the coldToFrozenScript line ?

   coldToFrozenScript = "/opt/splunk/bin/python" "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py"

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...