Solved: How to troubleshoot when coldtofrozen stops workin...

mikefg · ‎08-24-2023

My coldtofrozen has stopped working. Might be related to python3, but I'm not 100% sure. I've done some tweaking to the coldtofrozen.py #! /opt/splunk/bin python and I've checked other settings, but all seem to be okay.

Are there any commands or tools I can run to help troubleshoot?
Where would the errors be logged?

Thanks

mikefg · ‎08-28-2023

I got it working again. I had to recopy the coldtofrozenexample.py and edit for my environment. I was missing some sections from the script.

View solution in original post

isoutamo · ‎08-24-2023

Hi

it's hard to say what is wrong as we cannot see your script.

Are you update splunk and change python to version 3 or how it have gone broken?

How you are calling it?

Basically you could test it with command under splunk user

splunk cmd <your script>

This runs it like inside splunk as it runs when it normally works.

If this didn't give you any error then just add debug to your script and try again.

r. Ismo

mikefg · ‎08-25-2023

Thanks for replying.

I had been running splunk 8.2 with it working fine and then upgraded to 9.0.3 on CentOS 7. I'm pretty sure 9.0.3 had been working fine as well, but I may be mistaken on the timing.

I did have to do a server name change on the indexers, but the frozen server stayed the same. My indexers write to a frozen server over NFS and nfs exports uses ip address, not name.

My coldtofrozen.py script hasn't changed, but when I run it this is what I get.

[bin]# ./splunk cmd "/opt/splunk/bin/python" "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py"
usage: python coldToFrozenExample.py <bucket_dir_to_archive>
[bin]#

Full script, edited to remove comments and internal server names.

#!/opt/splunk/bin python

import sys, os, gzip, shutil, subprocess, random, socket
hname = socket.gethostname()

ARCHIVE_DIR = os.path.join('/mnt/nfs/frozensrv', hname)

def handleNewBucket(base, files):
   print('Archiving bucket: ' + base)
   for f in files:
      full = os.path.join(base, f)
      if os.path.isfile(full):
         os.remove(full)

if __name__ == "__main__":
if len(sys.argv) != 2:
sys.exit('usage: python coldToFrozenExample.py <bucket_dir_to_archive>')

   if not os.path.isdir(ARCHIVE_DIR):
         try:
            os.mkdir(ARCHIVE_DIR)
         except OSError:
   # Ignore already exists errors, another concurrent invokation may have already created this dir
   sys.stderr.write("mkdir warning: Directory '" + ARCHIVE_DIR + "' already exists\n")

   bucket = sys.argv[1]
   if not os.path.isdir(bucket):
      sys.exit('Given bucket is not a valid directory: ' + bucket)

   rawdatadir = os.path.join(bucket, 'rawdata')
   if not os.path.isdir(rawdatadir):
         sys.exit('No rawdata directory, given bucket is likely invalid: ' + bucket)

   files = os.listdir(bucket)
   journal = os.path.join(rawdatadir, 'journal.gz')
   if os.path.isfile(journal):
         handleNewBucket(bucket, files)
   else:
         handleOldBucket(bucket, files)

if bucket.endswith('/'):
bucket = bucket[:-1]

indexname = os.path.basename(os.path.dirname(os.path.dirname(bucket)))
destdir = os.path.join(ARCHIVE_DIR, indexname, os.path.basename(bucket))

   while os.path.isdir(destdir):
         print('Warning: This bucket already exists in the archive directory')
         print('Adding a random extension to this directory...')
         destdir += '.' + str(random.randrange(10))

shutil.copytree(bucket, destdir)

isoutamo · ‎08-26-2023

When you are running this from cmd line, you must give all needed parameters for it. Check from your script what those are, but if I recall right at least the bucket me it’s path is needed.

mikefg · ‎08-28-2023

I'm passing a bucket path and getting this error now. Replaced some path values to hide internal names and bucket name.

bash-4.2$ /opt/splunk/bin/splunk cmd "/opt/splunk/bin/python" "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py" "/opt/splunk/var/lib/splunk/indexname/db/db_string"

Traceback (most recent call last):
File "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py", line 51, in <module>
handleOldBucket(bucket, files)
NameError: name 'handleOldBucket' is not defined

isoutamo · ‎08-28-2023

As we don’t know content of your script, we cannot really help you more. I propose that you try to find someone who knows enough Python and you look together what’s wrong in script and fix it.

mikefg · ‎08-28-2023

I posted my script in this thread and it's the sample script, just edited for my environment. It's been working fine for a long time until this issue arose.

mikefg · ‎08-28-2023

I got it working again. I had to recopy the coldtofrozenexample.py and edit for my environment. I was missing some sections from the script.

mikefg · ‎08-25-2023

It looks like this line

if len(sys.argv) != 2:

is where the script runs into trouble. I've also tried

if len(sys.argv) < 2:

but same results. Not sure what values I need to pass or where to pass them to get the script to proceed. Maybe something in indexes.conf on the coldToFrozenScript line ?

coldToFrozenScript = "/opt/splunk/bin/python" "/opt/splunk/etc/peer-apps/archive_app/coldToFrozen.py"

How to troubleshoot when coldtofrozen stops working?

indexer

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...