All Apps and Add-ons

Potential bug in R Analytics App

jedatt01
Builder

Hi guys at Itility, I attended your session at .conf 2016. I've been playing around with your R app and am seeing that frequently when using the runRdo custom command that I get inconsistent results coming back from R in Splunk. Example below.

The search below occasionally comes back with the correct results and populates splunk with the test data frame. However, more often than not it comes back with a Null error.

| inputlookup iris.csv | runRdo script="set.seed(1); my_iris = dataset[-5]; species = dataset$species; kmeans_iris = kmeans(my_iris,3); kmeans_table = table(kmeans_iris$cluster,species); test = as.data.frame(kmeans_table); return(test);"


#error results
message                                                      session         status
NA/NaN/Inf in foreign function call (arg 1) In call: do_one(nmeth)  0           400


#correct results
Var1    Freq    species
1      50   Iris Setosa
2      0    Iris Setosa
3      0    Iris Setosa
1      0    Iris Versicolor
2      2    Iris Versicolor
3      48   Iris Versicolor
1      0    Iris Virginica
2      36   Iris Virginica
3      14   Iris Virginica

Please let me know what you think.

0 Karma
1 Solution

gwobben
Communicator

Thanks for using the R app! (and attending our presentation)

There are a couple of things you need to take into account:
1. Splunk is not consistent in the order of the columns (even when using table or fields commands). This means that dataset[-5] will not give you a consistent column. We haven't found a workaround yet, however, you can use column names in R.
2. Splunk in not aware of any data types and will always send out strings (even when it's obvious that your data is numeric). Our app will try to parse the data as numeric but when it fails R will receive chars instead of numerics. It's most safe to cast data types in R explicitly.

When debugging, you can use the parameter getResults=false which will give you a link to the console output by R. When using the str() command in R the console will show the data types.

So back to your query. This example should work (works on my machine):

| inputlookup iris.csv 
| runRdo script="
    # Fix the random seed
    set.seed(1);

    # Store the dataset in a variable
    my_iris = dataset;

    # Seperate the species column from the rest
    species = as.factor(my_iris$species);
    my_iris = my_iris[ , !(names(my_iris) %in% c('species'))];

    # Cast data types
    my_iris$petal_length = as.numeric(my_iris$petal_length);
    my_iris$sepal_length = as.numeric(my_iris$sepal_length);
    my_iris$petal_width = as.numeric(my_iris$petal_width);
    my_iris$sepal_width = as.numeric(my_iris$sepal_width);

    # Show summaries in the console, use getResults=false to see the link to the console
    str(species);
    str(my_iris);

    # Perform the kmeans
    kmeans_iris = kmeans(my_iris, 3);
    kmeans_table = table(kmeans_iris$cluster, species);

    # Return a dataframe
    return(as.data.frame(kmeans_table));" getResults=t

I hope this fixes your issue! We'd love to hear how your using our app so stay in touch!

View solution in original post

Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...