Getting Data In

How do I unzip a file when pulling it from REST API?

Contributor

So the rest API that I set up in Splunk will go out to this rest endpoint and the file that it will receive is a zip file. Inside this zip file, there are 2 CSV files but I only need to index 1 file (in this case, the file name is ENDPOINTCDRDETAILALLCSV). But I only see 3 options for the response type which is text, xml, and json. Does Splunk have an option for us to set may be a response handler to unzip the file and only index 1 file out of the 2?

The name and form of the file:
alt text

Content inside the zip file:
alt text

0 Karma
1 Solution

Ultra Champion

In rest_ta/bin/responsehandlers.py add a custom response handler , pseudo example :

class ZipFileResponseHandler:

def __init__(self,**args):
    self.csv_file_to_index = args['csv_file_to_index']

def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
    import zipfile,io,re
    file = zipfile.ZipFile(BytesIO(response_object.content))
    for info in file.infolist():
        if re.match(self.csv_file_to_index, info.filename):
            filecontent = file.read(info)
            print_xml_stream(filecontent)

In your config stanza , apply this handler :

alt text

The csv_file_to_index parameter value in this example is a python regex such as :

  1. ENDPOINT_CDR_DETAIL_ALL_CSV\.csvfor an exact filename to extract from the zip
  2. .*CDR_DETAIL.*\.csv$ for a pattern for the filename(s) to extract from the zip

View solution in original post

Ultra Champion

In rest_ta/bin/responsehandlers.py add a custom response handler , pseudo example :

class ZipFileResponseHandler:

def __init__(self,**args):
    self.csv_file_to_index = args['csv_file_to_index']

def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
    import zipfile,io,re
    file = zipfile.ZipFile(BytesIO(response_object.content))
    for info in file.infolist():
        if re.match(self.csv_file_to_index, info.filename):
            filecontent = file.read(info)
            print_xml_stream(filecontent)

In your config stanza , apply this handler :

alt text

The csv_file_to_index parameter value in this example is a python regex such as :

  1. ENDPOINT_CDR_DETAIL_ALL_CSV\.csvfor an exact filename to extract from the zip
  2. .*CDR_DETAIL.*\.csv$ for a pattern for the filename(s) to extract from the zip

View solution in original post

Contributor

This is my version of the code:

class ZipFileResponseHandler:

def __init__(self,**args):
    pass

def __call__(self, response_object, raw_response_output, response_type, req_args, endpoint):
    file = zipfile.ZipFile(StringIO.StringIO(response_object.content))
    for name in file.namelist():
        if "ENDPOINT" in name:
            data =file.read(name)
            data = data.split('\n')
            for element in data[1:]:
                print_xml_stream(element)
0 Karma

Ultra Champion

I suggest using the REST API Modualr Input and plugging in a custom response handler to perform the unzipping for you and any other pre processing you require.

Here is an example in another answer.

Contributor

Could you give me more information as how do I make the handler give the specific file to the indexer

0 Karma

SplunkTrust
SplunkTrust

Hi

can you please let me how you call REST API, using the script or anything else ??

0 Karma

Contributor

I was able to download the rest api from splunk but for now, I'm not using any script yet. Do you think I could do this by writing a script that could run every minute to go to the url api? Again if the script allows me to unzip the file and pick what file I want. Thanks!

0 Karma

SplunkTrust
SplunkTrust

Yes,
you can create scripted input which downloads and extracts files for you.

Create inputs.conf in your app and put below configuration in file.

[script:///opt/splunk/etc/app/yourapp/bin/scriptedfile.py]
disabled = 0
interval = 60

This will run file every 60 secs. You can change as per your requirement.

Create bin/scriptedfile.py and do code for REST API (file download ) and extraction of files.

Scripted Input docs:
https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/AdvancedDev/ScriptedInputsIntro

0 Karma

Contributor

Or if REST API couldn't do this. Is there any alternative way?

0 Karma