Hi guys at Itility, I attended your session at .conf 2016. I've been playing around with your R app and am seeing that frequently when using the runRdo custom command that I get inconsistent results coming back from R in Splunk. Example below.
The search below occasionally comes back with the correct results and populates splunk with the test data frame. However, more often than not it comes back with a Null error.
| inputlookup iris.csv | runRdo script="set.seed(1); my_iris = dataset[-5]; species = dataset$species; kmeans_iris = kmeans(my_iris,3); kmeans_table = table(kmeans_iris$cluster,species); test = as.data.frame(kmeans_table); return(test);"
#error results
message session status
NA/NaN/Inf in foreign function call (arg 1) In call: do_one(nmeth) 0 400
#correct results
Var1 Freq species
1 50 Iris Setosa
2 0 Iris Setosa
3 0 Iris Setosa
1 0 Iris Versicolor
2 2 Iris Versicolor
3 48 Iris Versicolor
1 0 Iris Virginica
2 36 Iris Virginica
3 14 Iris Virginica
Please let me know what you think.
Thanks for using the R app! (and attending our presentation)
There are a couple of things you need to take into account:
1. Splunk is not consistent in the order of the columns (even when using table or fields commands). This means that dataset[-5] will not give you a consistent column. We haven't found a workaround yet, however, you can use column names in R.
2. Splunk in not aware of any data types and will always send out strings (even when it's obvious that your data is numeric). Our app will try to parse the data as numeric but when it fails R will receive chars instead of numerics. It's most safe to cast data types in R explicitly.
When debugging, you can use the parameter getResults=false
which will give you a link to the console output by R. When using the str()
command in R the console will show the data types.
So back to your query. This example should work (works on my machine):
| inputlookup iris.csv
| runRdo script="
# Fix the random seed
set.seed(1);
# Store the dataset in a variable
my_iris = dataset;
# Seperate the species column from the rest
species = as.factor(my_iris$species);
my_iris = my_iris[ , !(names(my_iris) %in% c('species'))];
# Cast data types
my_iris$petal_length = as.numeric(my_iris$petal_length);
my_iris$sepal_length = as.numeric(my_iris$sepal_length);
my_iris$petal_width = as.numeric(my_iris$petal_width);
my_iris$sepal_width = as.numeric(my_iris$sepal_width);
# Show summaries in the console, use getResults=false to see the link to the console
str(species);
str(my_iris);
# Perform the kmeans
kmeans_iris = kmeans(my_iris, 3);
kmeans_table = table(kmeans_iris$cluster, species);
# Return a dataframe
return(as.data.frame(kmeans_table));" getResults=t
I hope this fixes your issue! We'd love to hear how your using our app so stay in touch!
Thanks for using the R app! (and attending our presentation)
There are a couple of things you need to take into account:
1. Splunk is not consistent in the order of the columns (even when using table or fields commands). This means that dataset[-5] will not give you a consistent column. We haven't found a workaround yet, however, you can use column names in R.
2. Splunk in not aware of any data types and will always send out strings (even when it's obvious that your data is numeric). Our app will try to parse the data as numeric but when it fails R will receive chars instead of numerics. It's most safe to cast data types in R explicitly.
When debugging, you can use the parameter getResults=false
which will give you a link to the console output by R. When using the str()
command in R the console will show the data types.
So back to your query. This example should work (works on my machine):
| inputlookup iris.csv
| runRdo script="
# Fix the random seed
set.seed(1);
# Store the dataset in a variable
my_iris = dataset;
# Seperate the species column from the rest
species = as.factor(my_iris$species);
my_iris = my_iris[ , !(names(my_iris) %in% c('species'))];
# Cast data types
my_iris$petal_length = as.numeric(my_iris$petal_length);
my_iris$sepal_length = as.numeric(my_iris$sepal_length);
my_iris$petal_width = as.numeric(my_iris$petal_width);
my_iris$sepal_width = as.numeric(my_iris$sepal_width);
# Show summaries in the console, use getResults=false to see the link to the console
str(species);
str(my_iris);
# Perform the kmeans
kmeans_iris = kmeans(my_iris, 3);
kmeans_table = table(kmeans_iris$cluster, species);
# Return a dataframe
return(as.data.frame(kmeans_table));" getResults=t
I hope this fixes your issue! We'd love to hear how your using our app so stay in touch!
Thank you this works perfectly! I can see now that the column order changes if I run the search multiple times. I will avoid using index references from now on and make sure to cast my data types as well.
Glad to hear it worked! I've added this question (and answer) to Splunkbase: https://splunkbase.splunk.com/app/3339/#/details
Hi .. for me nothing is getting printed after clicking the run button in script editor
not even error is coming ..
is opencpu mandatory for this ? and can we isntall it in the same machine as splunk server ?
please respond ASAP
Yes, OpenCPU is mandatory, and Yes, you can install it on the same machine. Good luck!
public.opencpu.org would not work ?
actually we dont have right to install opencpu as of now.
so thought to use some public opencpu
Sure, that should work. Just be absolutely sure you're willing to send your data and your algorithm to some unfamiliar host and be aware that you cannot use libraries that are not installed on the OpenCPU server that you're using.
but nothing is coming when i am clicking the run button in splunk app
R console tab is hidden only . at least some error should come
ps i am using public.opencpu.org only
External search command 'runrpairs' returned error code 1. Script output = "error_message=ConnectionError at "/data/splunk_axpclp/lib/python2.7/site-packages/requests/adapters.py", line 375 : HTTPSConnectionPool(host='public.opencpu.org', port=443): Max retries exceeded with url: /ocpu/library/base/R/identity (Caused by : [Errno -2] Name or service not known) "
I've just tried public.opencpu.org on my own machine and it's working just fine.. Please make sure that your Splunk machine is able to connect to public.opencpu.org (no network issues / firewalls) and make sure that your configuration includes https as protocol (go to apps -> manage apps -> click on setup next to the R Analytics app -> fill out "https://public.opencpu.org" and click save).
If this doesn't work, please share your setup.