I've create a dashboard to visualize a business software log analysis. Before adding flume agent I was processing data in excel and then through Python program because there're a lot of tedious field extractions to do, such as extracting about ten fields from one column that's JSON formatted. And then I would manually upload processed .csv
file and search in Splunk.
But after adding flume agent, the _raw data was really messy and not responding to my rex search. Also if I were to use rex the search string could be about 100 lines for each dashboard panel. So I was wondering if there's way to insert my Python program to the dashboard source to process data?
my props.config:
> [log_session]INDEXED_EXTRACTIONS =csv
> FIELD_DELIMITER=,
> [source::/datalake/log/******]
> sourcetype = my_named_source
Hello @dannili,
I use the following in order to extract fields using a python script.
First, please read this http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/Script, in order to know how to integrate the script to you app.
Then, the scipt by itself looks like:
import splunk.Intersplunk
#import other libs if necessary
results,unused1,unused2 = splunk.Intersplunk.getOrganizedResults()
for result in results:
#result is an OrderedDict, so you can access the fields by their name
for item in result.keys():
if item=='theNameOfTheFieldWhereINeedToExtractSomething':
#do stuff, and add the extracted info to result
result['addedField']=SomeExtractedValue
splunk.Intersplunk.outputResults(results)
You can then call the script as described in http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/Script
Hello @dannili,
I use the following in order to extract fields using a python script.
First, please read this http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/Script, in order to know how to integrate the script to you app.
Then, the scipt by itself looks like:
import splunk.Intersplunk
#import other libs if necessary
results,unused1,unused2 = splunk.Intersplunk.getOrganizedResults()
for result in results:
#result is an OrderedDict, so you can access the fields by their name
for item in result.keys():
if item=='theNameOfTheFieldWhereINeedToExtractSomething':
#do stuff, and add the extracted info to result
result['addedField']=SomeExtractedValue
splunk.Intersplunk.outputResults(results)
You can then call the script as described in http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/Script
Thanks for your response. I have a few questions tho. For line 5, results,unused1,unused2 = splunk.Intersplunk.getOrganizedResults()
, does it mean initialize all the columns in the file? Also, I use pandas
library in Python so the program executes files as unit of columns, so the loop is not necessary in this case. Then will it still work if for result in results:
is not included?
@dannili
Well, I do not know panda well enough to answer your comment. But basically, the results
you get from splunk.Intersplunk.getOrganizedResults()
is a list of OrderedDict.
The peace of code provided in the answer give a way to walk through all the items being the result of the splunk query.
For each item (result) in the code of the answer, I expect you need to do something in order to extract your data.
ok thanks for your response