Splunk R App uses default stringsAsFactors=TRUE


First, thanks for changing the read.csv call to leave the _time field name unmodified in version 0.3.10.

I have found cases where the input data frame's string factors cause problems, and have not yet found a case where I actually needed the strings as factors. I don't know if it's worth it to change the read.csv call to add stringsAsFactors=FALSE, but here's what I do to remove them and convert them back to character strings, in case anyone else wishes to know:

factor_fields <- sapply(input, is.factor)
input[factor_fields] <- lapply(input[factor_fields], as.character)

As a bonus, here is the contents of a simple R script (let's call it subset.r) that removes the Splunk fields that aren't terribly useful (such as raw and the date fields). The table command can be used to select a subset of fields, but you must know your subset; this script automatically determines a subset that reduces the size of the resulting CSV export by about one-third:

originalNames <- names(input)

remove_names <- originalNames[c(grep("^date", originalNames),
                                grep("^_", originalNames))]

remove_names <- c(remove_names,
                  "splunk_server", "splunk_server_group",
                  "Label", "linecount", "punct", "eventtype")

subsetNames <- c("_time",
                 originalNames[!(originalNames %in% remove_names)])

output <- subset(input, select=subsetNames)

Then whenever I need to export Splunk results as CSV (for example, to use R to perform a beautiful scatter plot using ggplot2), I just add the following to the end of my Splunk query:

| r subset.r

Bottom line: Would you agree to update your R app to add the stringsAsFactors=FALSE argument to the read.csv call in the R app?

Ideally, I'd like an R app that would just be connected to the pipeline's stdout and require me to call read.csv, and then each of my R scripts could set the specific options it needed and avoid post-processing re-form the input back to what the script requires.

Just a thought. Thanks for your consideration.

I just created a new version of the R app that uses stringsAsFactors=FALSE for the read.csv call.

Plus, I updated the app's setup page allowing you to customize the read.csv and write.csv function call options.

I haven't uploaded the new version to apps.aplunk.con yet, because I want you to test it first. I uploaded the package here.

Thanks! I have installed your update (version 0.3.11) and verified that it works, both with my existing scripts, and then with my updated scripts.

I noticed your default stringsAsFactors=FALSE option in the r.conf file, and left it as-is.

I left the write.csv options empty, but it's nice to know there's a place to set them if and when a change is required, all with the existing R app.

Thank you again! We love it!

