Splunk Search

Can a Splunk search call a Python script to perform data manipulation?

andrewtrobec
Motivator

Hello,

Before I waste too much time trying to get this to work, I'd like to know whether a Splunk search can call a Python script to perform data manipulation. I have a working Python script that is able to calculate the difference in business hours between two timestamps. What I'd like to do is configure Splunk so that I can pass event fields to the script as parameters and store the output as a new field. I suppose the search would be something like:

index=xyz | eval DIFF = callPythonScript(DATE1, DATE2) | table DATE1, DATE2, DIFF

Is this at all possible? Are there any tutorials that explain how to accomplish this?

Thank you!

Andrew

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

I deleted all the back and forth we did trying to get this streaming search command to work. For others coming across this thread, I offered to help write a search command, and we went back and forth debugging it for a while so it made sense to delete all the back and forth and just post the final result here.

Trick is using the results["fieldName"] to get the value of the existing field in the search pipeline. The code below works fine with this search:

|makeresults count=1| eval time1=strftime(_time-4845858,"%F %T") | eval time2=strftime(_time+86400, "%F %T") | totalbusinesshours time1 time2 4 17

### SCRIPT NAME: totalbusinesshours.py
### AUTHOR: your_name_here
### Copyright 2016 your_name_here
###
### Licensed under the Apache License, Version 2.0 (the "License");
### you may not use this file except in compliance with the License.
### You may obtain a copy of the License at
###
###    http://www.apache.org/licenses/LICENSE-2.0
###
### Unless required by applicable law or agreed to in writing, software
### distributed under the License is distributed on an "AS IS" BASIS,
### WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
### See the License for the specific language governing permissions and
### limitations under the License.
###

import splunk.Intersplunk
import splunk.mining.dcutils as dcu
import traceback
import sys
from datetime import datetime, timedelta

# Setup logging/logger
logger = dcu.getLogger()

def clamp(t, start, end):
  try:
    "Return `t` clamped to the range [`start`, `end`]."
    return max(start, min(end, t))
  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))

def day_part(t):
  try:
    "Return timedelta between midnight and `t`."
    return t - t.replace(hour = 0, minute = 0, second = 0)
  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))

def office_time_between(a, b, start = timedelta(hours = 8),stop = timedelta(hours = 18)):
  try:
    """
    Return the total office time between `a` and `b` as a timedelta
    object. Office time consists of weekdays from `start` to `stop`.
    """
    a = datetime.strptime(a, '%Y-%m-%d %H:%M:%S')
    b = datetime.strptime(b, '%Y-%m-%d %H:%M:%S')

    zero = timedelta(0)
    assert(zero <= start <= stop <= timedelta(1))
    office_day = stop - start
    days = (b - a).days + 1
    weeks = days // 7
    extra = (max(0, 5 - a.weekday()) + min(5, 1 + b.weekday())) % 5
    weekdays = weeks * 5 + extra
    total = office_day * weekdays
    if a.weekday() < 5:
      total -= clamp(day_part(a) - start, zero, office_day)
    if b.weekday() < 5:
      total -= clamp(stop - day_part(b), zero, office_day)
    return total.total_seconds()/60/60
  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))

def execute():
  try:
    # get the keywords and options passed to this command
    keywords, options = splunk.Intersplunk.getKeywordsAndOptions()
    logger.info(keywords)
    # get the previous search results
    results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()
    for result in results:
      if len(keywords) == 2:
        logger.info(result[keywords[0]])
        result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]])
        results.sort()
        splunk.Intersplunk.outputResults(results)
        break
      if len(keywords) == 3:
        result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])))
        results.sort()
        splunk.Intersplunk.outputResults(results)
        break
      if len(keywords) == 4:
        result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])),timedelta(hours= int(keywords[3])))
        results.sort()
        splunk.Intersplunk.outputResults(results)
        break
      else:
        result["error"] = "syntax: totalbusinesshours <date_1> <date_2> <business_start_hour> <business_stop_hour>"
        result["example"] = "example: totalbusinesshours dateField1 dateField2 9 17"
        results.sort()
        splunk.Intersplunk.outputResults(results)

  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))
    logger.error(str(e))

if __name__ == '__main__':
    execute()

View solution in original post

jkat54
SplunkTrust
SplunkTrust

I deleted all the back and forth we did trying to get this streaming search command to work. For others coming across this thread, I offered to help write a search command, and we went back and forth debugging it for a while so it made sense to delete all the back and forth and just post the final result here.

Trick is using the results["fieldName"] to get the value of the existing field in the search pipeline. The code below works fine with this search:

|makeresults count=1| eval time1=strftime(_time-4845858,"%F %T") | eval time2=strftime(_time+86400, "%F %T") | totalbusinesshours time1 time2 4 17

### SCRIPT NAME: totalbusinesshours.py
### AUTHOR: your_name_here
### Copyright 2016 your_name_here
###
### Licensed under the Apache License, Version 2.0 (the "License");
### you may not use this file except in compliance with the License.
### You may obtain a copy of the License at
###
###    http://www.apache.org/licenses/LICENSE-2.0
###
### Unless required by applicable law or agreed to in writing, software
### distributed under the License is distributed on an "AS IS" BASIS,
### WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
### See the License for the specific language governing permissions and
### limitations under the License.
###

import splunk.Intersplunk
import splunk.mining.dcutils as dcu
import traceback
import sys
from datetime import datetime, timedelta

# Setup logging/logger
logger = dcu.getLogger()

def clamp(t, start, end):
  try:
    "Return `t` clamped to the range [`start`, `end`]."
    return max(start, min(end, t))
  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))

def day_part(t):
  try:
    "Return timedelta between midnight and `t`."
    return t - t.replace(hour = 0, minute = 0, second = 0)
  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))

def office_time_between(a, b, start = timedelta(hours = 8),stop = timedelta(hours = 18)):
  try:
    """
    Return the total office time between `a` and `b` as a timedelta
    object. Office time consists of weekdays from `start` to `stop`.
    """
    a = datetime.strptime(a, '%Y-%m-%d %H:%M:%S')
    b = datetime.strptime(b, '%Y-%m-%d %H:%M:%S')

    zero = timedelta(0)
    assert(zero <= start <= stop <= timedelta(1))
    office_day = stop - start
    days = (b - a).days + 1
    weeks = days // 7
    extra = (max(0, 5 - a.weekday()) + min(5, 1 + b.weekday())) % 5
    weekdays = weeks * 5 + extra
    total = office_day * weekdays
    if a.weekday() < 5:
      total -= clamp(day_part(a) - start, zero, office_day)
    if b.weekday() < 5:
      total -= clamp(stop - day_part(b), zero, office_day)
    return total.total_seconds()/60/60
  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))

def execute():
  try:
    # get the keywords and options passed to this command
    keywords, options = splunk.Intersplunk.getKeywordsAndOptions()
    logger.info(keywords)
    # get the previous search results
    results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()
    for result in results:
      if len(keywords) == 2:
        logger.info(result[keywords[0]])
        result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]])
        results.sort()
        splunk.Intersplunk.outputResults(results)
        break
      if len(keywords) == 3:
        result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])))
        results.sort()
        splunk.Intersplunk.outputResults(results)
        break
      if len(keywords) == 4:
        result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])),timedelta(hours= int(keywords[3])))
        results.sort()
        splunk.Intersplunk.outputResults(results)
        break
      else:
        result["error"] = "syntax: totalbusinesshours <date_1> <date_2> <business_start_hour> <business_stop_hour>"
        result["example"] = "example: totalbusinesshours dateField1 dateField2 9 17"
        results.sort()
        splunk.Intersplunk.outputResults(results)

  except Exception, e:
    stack =  traceback.format_exc()
    splunk.Intersplunk.generateErrorResults(str(e))
    logger.error(str(e) + ". Traceback: " + str(stack))
    logger.error(str(e))

if __name__ == '__main__':
    execute()

A_Khabrov
New Member

I think this code have an issue:

startstamp,stopstamp,daystart_h,dayend_h,deltastamp
1508509203,1508924839,8,18,5236

My local time is GMT+3
And we have two time delta in local time:
Start 2017-10-20 17:20:03 GMT+3 1508924839 UTC
Stop 2017-10-25 12:47:19 GMT+3 1508509203 UTC

Our work day starts in 8 and ends in 18.
Following code returns work hours: 5236 in seconds...

Can you help us?

0 Karma

A_Khabrov
New Member
0 Karma

andrewtrobec
Motivator

That is incredible! I see that results is some sort of array that you iterate through. I guess it's a bit confusing since it seems to be a kind of input. I'll play around with the logs to see the behavior of the code. Thank you so much for taking the time to help me out!

bmacias84
Champion

Yes, but not in the way you are describing. This could be accomplish by building a custom search command which accepts to arguments which can be field names or values. There are four basic types of commands Eventing, Generating, Reporting, and Streaming. You would want to create a Streaming command which would append a new field to your event contain your diff value. Splunk provides an SDK and examples.

http://dev.splunk.com/view/python-sdk/SP-CAAAEU2
https://github.com/splunk/splunk-sdk-python

andrewtrobec
Motivator

Thank you for the suggestion, I will take a look into Streaming commands.

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...