Hello,
Before I waste too much time trying to get this to work, I'd like to know whether a Splunk search can call a Python script to perform data manipulation. I have a working Python script that is able to calculate the difference in business hours between two timestamps. What I'd like to do is configure Splunk so that I can pass event fields to the script as parameters and store the output as a new field. I suppose the search would be something like:
index=xyz | eval DIFF = callPythonScript(DATE1, DATE2) | table DATE1, DATE2, DIFF
Is this at all possible? Are there any tutorials that explain how to accomplish this?
Thank you!
Andrew
I deleted all the back and forth we did trying to get this streaming search command to work. For others coming across this thread, I offered to help write a search command, and we went back and forth debugging it for a while so it made sense to delete all the back and forth and just post the final result here.
Trick is using the results["fieldName"] to get the value of the existing field in the search pipeline. The code below works fine with this search:
|makeresults count=1| eval time1=strftime(_time-4845858,"%F %T") | eval time2=strftime(_time+86400, "%F %T") | totalbusinesshours time1 time2 4 17
### SCRIPT NAME: totalbusinesshours.py
### AUTHOR: your_name_here
### Copyright 2016 your_name_here
###
### Licensed under the Apache License, Version 2.0 (the "License");
### you may not use this file except in compliance with the License.
### You may obtain a copy of the License at
###
### http://www.apache.org/licenses/LICENSE-2.0
###
### Unless required by applicable law or agreed to in writing, software
### distributed under the License is distributed on an "AS IS" BASIS,
### WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
### See the License for the specific language governing permissions and
### limitations under the License.
###
import splunk.Intersplunk
import splunk.mining.dcutils as dcu
import traceback
import sys
from datetime import datetime, timedelta
# Setup logging/logger
logger = dcu.getLogger()
def clamp(t, start, end):
try:
"Return `t` clamped to the range [`start`, `end`]."
return max(start, min(end, t))
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
def day_part(t):
try:
"Return timedelta between midnight and `t`."
return t - t.replace(hour = 0, minute = 0, second = 0)
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
def office_time_between(a, b, start = timedelta(hours = 8),stop = timedelta(hours = 18)):
try:
"""
Return the total office time between `a` and `b` as a timedelta
object. Office time consists of weekdays from `start` to `stop`.
"""
a = datetime.strptime(a, '%Y-%m-%d %H:%M:%S')
b = datetime.strptime(b, '%Y-%m-%d %H:%M:%S')
zero = timedelta(0)
assert(zero <= start <= stop <= timedelta(1))
office_day = stop - start
days = (b - a).days + 1
weeks = days // 7
extra = (max(0, 5 - a.weekday()) + min(5, 1 + b.weekday())) % 5
weekdays = weeks * 5 + extra
total = office_day * weekdays
if a.weekday() < 5:
total -= clamp(day_part(a) - start, zero, office_day)
if b.weekday() < 5:
total -= clamp(stop - day_part(b), zero, office_day)
return total.total_seconds()/60/60
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
def execute():
try:
# get the keywords and options passed to this command
keywords, options = splunk.Intersplunk.getKeywordsAndOptions()
logger.info(keywords)
# get the previous search results
results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()
for result in results:
if len(keywords) == 2:
logger.info(result[keywords[0]])
result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]])
results.sort()
splunk.Intersplunk.outputResults(results)
break
if len(keywords) == 3:
result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])))
results.sort()
splunk.Intersplunk.outputResults(results)
break
if len(keywords) == 4:
result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])),timedelta(hours= int(keywords[3])))
results.sort()
splunk.Intersplunk.outputResults(results)
break
else:
result["error"] = "syntax: totalbusinesshours <date_1> <date_2> <business_start_hour> <business_stop_hour>"
result["example"] = "example: totalbusinesshours dateField1 dateField2 9 17"
results.sort()
splunk.Intersplunk.outputResults(results)
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
logger.error(str(e))
if __name__ == '__main__':
execute()
I deleted all the back and forth we did trying to get this streaming search command to work. For others coming across this thread, I offered to help write a search command, and we went back and forth debugging it for a while so it made sense to delete all the back and forth and just post the final result here.
Trick is using the results["fieldName"] to get the value of the existing field in the search pipeline. The code below works fine with this search:
|makeresults count=1| eval time1=strftime(_time-4845858,"%F %T") | eval time2=strftime(_time+86400, "%F %T") | totalbusinesshours time1 time2 4 17
### SCRIPT NAME: totalbusinesshours.py
### AUTHOR: your_name_here
### Copyright 2016 your_name_here
###
### Licensed under the Apache License, Version 2.0 (the "License");
### you may not use this file except in compliance with the License.
### You may obtain a copy of the License at
###
### http://www.apache.org/licenses/LICENSE-2.0
###
### Unless required by applicable law or agreed to in writing, software
### distributed under the License is distributed on an "AS IS" BASIS,
### WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
### See the License for the specific language governing permissions and
### limitations under the License.
###
import splunk.Intersplunk
import splunk.mining.dcutils as dcu
import traceback
import sys
from datetime import datetime, timedelta
# Setup logging/logger
logger = dcu.getLogger()
def clamp(t, start, end):
try:
"Return `t` clamped to the range [`start`, `end`]."
return max(start, min(end, t))
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
def day_part(t):
try:
"Return timedelta between midnight and `t`."
return t - t.replace(hour = 0, minute = 0, second = 0)
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
def office_time_between(a, b, start = timedelta(hours = 8),stop = timedelta(hours = 18)):
try:
"""
Return the total office time between `a` and `b` as a timedelta
object. Office time consists of weekdays from `start` to `stop`.
"""
a = datetime.strptime(a, '%Y-%m-%d %H:%M:%S')
b = datetime.strptime(b, '%Y-%m-%d %H:%M:%S')
zero = timedelta(0)
assert(zero <= start <= stop <= timedelta(1))
office_day = stop - start
days = (b - a).days + 1
weeks = days // 7
extra = (max(0, 5 - a.weekday()) + min(5, 1 + b.weekday())) % 5
weekdays = weeks * 5 + extra
total = office_day * weekdays
if a.weekday() < 5:
total -= clamp(day_part(a) - start, zero, office_day)
if b.weekday() < 5:
total -= clamp(stop - day_part(b), zero, office_day)
return total.total_seconds()/60/60
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
def execute():
try:
# get the keywords and options passed to this command
keywords, options = splunk.Intersplunk.getKeywordsAndOptions()
logger.info(keywords)
# get the previous search results
results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()
for result in results:
if len(keywords) == 2:
logger.info(result[keywords[0]])
result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]])
results.sort()
splunk.Intersplunk.outputResults(results)
break
if len(keywords) == 3:
result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])))
results.sort()
splunk.Intersplunk.outputResults(results)
break
if len(keywords) == 4:
result["totalbusinesshours"] = office_time_between(result[keywords[0]],result[keywords[1]],timedelta(hours= int(keywords[2])),timedelta(hours= int(keywords[3])))
results.sort()
splunk.Intersplunk.outputResults(results)
break
else:
result["error"] = "syntax: totalbusinesshours <date_1> <date_2> <business_start_hour> <business_stop_hour>"
result["example"] = "example: totalbusinesshours dateField1 dateField2 9 17"
results.sort()
splunk.Intersplunk.outputResults(results)
except Exception, e:
stack = traceback.format_exc()
splunk.Intersplunk.generateErrorResults(str(e))
logger.error(str(e) + ". Traceback: " + str(stack))
logger.error(str(e))
if __name__ == '__main__':
execute()
I think this code have an issue:
startstamp,stopstamp,daystart_h,dayend_h,deltastamp
1508509203,1508924839,8,18,5236
My local time is GMT+3
And we have two time delta in local time:
Start 2017-10-20 17:20:03 GMT+3 1508924839 UTC
Stop 2017-10-25 12:47:19 GMT+3 1508509203 UTC
Our work day starts in 8 and ends in 18.
Following code returns work hours: 5236 in seconds...
Can you help us?
Solution Found!!!
https://codereview.stackexchange.com/a/179542
=)
That is incredible! I see that results is some sort of array that you iterate through. I guess it's a bit confusing since it seems to be a kind of input. I'll play around with the logs to see the behavior of the code. Thank you so much for taking the time to help me out!
Yes, but not in the way you are describing. This could be accomplish by building a custom search command which accepts to arguments which can be field names or values. There are four basic types of commands Eventing, Generating, Reporting, and Streaming. You would want to create a Streaming command which would append a new field to your event contain your diff value. Splunk provides an SDK and examples.
http://dev.splunk.com/view/python-sdk/SP-CAAAEU2
https://github.com/splunk/splunk-sdk-python
Thank you for the suggestion, I will take a look into Streaming commands.