All Apps and Add-ons

Splunk R App uses default stringsAsFactors=TRUE


First, thanks for changing the read.csv call to leave the _time field name unmodified in version 0.3.10.

I have found cases where the input data frame's string factors cause problems, and have not yet found a case where I actually needed the strings as factors. I don't know if it's worth it to change the read.csv call to add stringsAsFactors=FALSE, but here's what I do to remove them and convert them back to character strings, in case anyone else wishes to know:

factor_fields <- sapply(input, is.factor)
input[factor_fields] <- lapply(input[factor_fields], as.character)

As a bonus, here is the contents of a simple R script (let's call it subset.r) that removes the Splunk fields that aren't terribly useful (such as raw and the date fields). The table command can be used to select a subset of fields, but you must know your subset; this script automatically determines a subset that reduces the size of the resulting CSV export by about one-third:

originalNames <- names(input)

remove_names <- originalNames[c(grep("^date", originalNames),
                                grep("^_", originalNames))]

remove_names <- c(remove_names,
                  "splunk_server", "splunk_server_group",
                  "Label", "linecount", "punct", "eventtype")

subsetNames <- c("_time",
                 originalNames[!(originalNames %in% remove_names)])

output <- subset(input, select=subsetNames)

Then whenever I need to export Splunk results as CSV (for example, to use R to perform a beautiful scatter plot using ggplot2), I just add the following to the end of my Splunk query:

| r subset.r

Bottom line: Would you agree to update your R app to add the stringsAsFactors=FALSE argument to the read.csv call in the R app?

Ideally, I'd like an R app that would just be connected to the pipeline's stdout and require me to call read.csv, and then each of my R scripts could set the specific options it needed and avoid post-processing re-form the input back to what the script requires.

Just a thought. Thanks for your consideration.

Tags (3)
0 Karma

Splunk Employee
Splunk Employee

I just created a new version of the R app that uses stringsAsFactors=FALSE for the read.csv call.

Plus, I updated the app's setup page allowing you to customize the read.csv and write.csv function call options.

I haven't uploaded the new version to apps.aplunk.con yet, because I want you to test it first. I uploaded the package here.

0 Karma


Thanks! I have installed your update (version 0.3.11) and verified that it works, both with my existing scripts, and then with my updated scripts.

I noticed your default stringsAsFactors=FALSE option in the r.conf file, and left it as-is.

I left the write.csv options empty, but it's nice to know there's a place to set them if and when a change is required, all with the existing R app.

Thank you again! We love it!

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In the last month, the Splunk Threat Research Team (STRT) has had 2 releases of new security content via the ...

Announcing the 1st Round Champion’s Tribute Winners of the Great Resilience Quest

We are happy to announce the 20 lucky questers who are selected to be the first round of Champion's Tribute ...

We’ve Got Education Validation!

Are you feeling it? All the career-boosting benefits of up-skilling with Splunk? It’s not just a feeling, it's ...