All Apps and Add-ons

Potential bug in R Analytics App

Builder

Hi guys at Itility, I attended your session at .conf 2016. I've been playing around with your R app and am seeing that frequently when using the runRdo custom command that I get inconsistent results coming back from R in Splunk. Example below.

The search below occasionally comes back with the correct results and populates splunk with the test data frame. However, more often than not it comes back with a Null error.

| inputlookup iris.csv | runRdo script="set.seed(1); my_iris = dataset[-5]; species = dataset$species; kmeans_iris = kmeans(my_iris,3); kmeans_table = table(kmeans_iris$cluster,species); test = as.data.frame(kmeans_table); return(test);"


#error results
message                                                      session         status
NA/NaN/Inf in foreign function call (arg 1) In call: do_one(nmeth)  0           400


#correct results
Var1    Freq    species
1      50   Iris Setosa
2      0    Iris Setosa
3      0    Iris Setosa
1      0    Iris Versicolor
2      2    Iris Versicolor
3      48   Iris Versicolor
1      0    Iris Virginica
2      36   Iris Virginica
3      14   Iris Virginica

Please let me know what you think.

0 Karma
1 Solution

Communicator

Thanks for using the R app! (and attending our presentation)

There are a couple of things you need to take into account:
1. Splunk is not consistent in the order of the columns (even when using table or fields commands). This means that dataset[-5] will not give you a consistent column. We haven't found a workaround yet, however, you can use column names in R.
2. Splunk in not aware of any data types and will always send out strings (even when it's obvious that your data is numeric). Our app will try to parse the data as numeric but when it fails R will receive chars instead of numerics. It's most safe to cast data types in R explicitly.

When debugging, you can use the parameter getResults=false which will give you a link to the console output by R. When using the str() command in R the console will show the data types.

So back to your query. This example should work (works on my machine):

| inputlookup iris.csv 
| runRdo script="
    # Fix the random seed
    set.seed(1);

    # Store the dataset in a variable
    my_iris = dataset;

    # Seperate the species column from the rest
    species = as.factor(my_iris$species);
    my_iris = my_iris[ , !(names(my_iris) %in% c('species'))];

    # Cast data types
    my_iris$petal_length = as.numeric(my_iris$petal_length);
    my_iris$sepal_length = as.numeric(my_iris$sepal_length);
    my_iris$petal_width = as.numeric(my_iris$petal_width);
    my_iris$sepal_width = as.numeric(my_iris$sepal_width);

    # Show summaries in the console, use getResults=false to see the link to the console
    str(species);
    str(my_iris);

    # Perform the kmeans
    kmeans_iris = kmeans(my_iris, 3);
    kmeans_table = table(kmeans_iris$cluster, species);

    # Return a dataframe
    return(as.data.frame(kmeans_table));" getResults=t

I hope this fixes your issue! We'd love to hear how your using our app so stay in touch!

View solution in original post

Communicator

Thanks for using the R app! (and attending our presentation)

There are a couple of things you need to take into account:
1. Splunk is not consistent in the order of the columns (even when using table or fields commands). This means that dataset[-5] will not give you a consistent column. We haven't found a workaround yet, however, you can use column names in R.
2. Splunk in not aware of any data types and will always send out strings (even when it's obvious that your data is numeric). Our app will try to parse the data as numeric but when it fails R will receive chars instead of numerics. It's most safe to cast data types in R explicitly.

When debugging, you can use the parameter getResults=false which will give you a link to the console output by R. When using the str() command in R the console will show the data types.

So back to your query. This example should work (works on my machine):

| inputlookup iris.csv 
| runRdo script="
    # Fix the random seed
    set.seed(1);

    # Store the dataset in a variable
    my_iris = dataset;

    # Seperate the species column from the rest
    species = as.factor(my_iris$species);
    my_iris = my_iris[ , !(names(my_iris) %in% c('species'))];

    # Cast data types
    my_iris$petal_length = as.numeric(my_iris$petal_length);
    my_iris$sepal_length = as.numeric(my_iris$sepal_length);
    my_iris$petal_width = as.numeric(my_iris$petal_width);
    my_iris$sepal_width = as.numeric(my_iris$sepal_width);

    # Show summaries in the console, use getResults=false to see the link to the console
    str(species);
    str(my_iris);

    # Perform the kmeans
    kmeans_iris = kmeans(my_iris, 3);
    kmeans_table = table(kmeans_iris$cluster, species);

    # Return a dataframe
    return(as.data.frame(kmeans_table));" getResults=t

I hope this fixes your issue! We'd love to hear how your using our app so stay in touch!

View solution in original post

Builder

Thank you this works perfectly! I can see now that the column order changes if I run the search multiple times. I will avoid using index references from now on and make sure to cast my data types as well.

0 Karma

Communicator

Glad to hear it worked! I've added this question (and answer) to Splunkbase: https://splunkbase.splunk.com/app/3339/#/details

0 Karma

New Member

Hi .. for me nothing is getting printed after clicking the run button in script editor
not even error is coming ..
is opencpu mandatory for this ? and can we isntall it in the same machine as splunk server ?

please respond ASAP

0 Karma

Communicator

Yes, OpenCPU is mandatory, and Yes, you can install it on the same machine. Good luck!

0 Karma

New Member

public.opencpu.org would not work ?

actually we dont have right to install opencpu as of now.
so thought to use some public opencpu

0 Karma

Communicator

Sure, that should work. Just be absolutely sure you're willing to send your data and your algorithm to some unfamiliar host and be aware that you cannot use libraries that are not installed on the OpenCPU server that you're using.

0 Karma

New Member

but nothing is coming when i am clicking the run button in splunk app
R console tab is hidden only . at least some error should come
ps i am using public.opencpu.org only

0 Karma

New Member

External search command 'runrpairs' returned error code 1. Script output = "error_message=ConnectionError at "/data/splunk_axpclp/lib/python2.7/site-packages/requests/adapters.py", line 375 : HTTPSConnectionPool(host='public.opencpu.org', port=443): Max retries exceeded with url: /ocpu/library/base/R/identity (Caused by : [Errno -2] Name or service not known) "

0 Karma

Communicator

I've just tried public.opencpu.org on my own machine and it's working just fine.. Please make sure that your Splunk machine is able to connect to public.opencpu.org (no network issues / firewalls) and make sure that your configuration includes https as protocol (go to apps -> manage apps -> click on setup next to the R Analytics app -> fill out "https://public.opencpu.org" and click save).

If this doesn't work, please share your setup.

0 Karma