First, thanks for changing the read.csv call to leave the _time field name unmodified in version 0.3.10.
I have found cases where the input data frame's string factors cause problems, and have not yet found a case where I actually needed the strings as factors. I don't know if it's worth it to change the read.csv call to add stringsAsFactors=FALSE, but here's what I do to remove them and convert them back to character strings, in case anyone else wishes to know:
factor_fields <- sapply(input, is.factor)
input[factor_fields] <- lapply(input[factor_fields], as.character)
As a bonus, here is the contents of a simple R script (let's call it subset.r) that removes the Splunk fields that aren't terribly useful (such as raw and the date fields). The table command can be used to select a subset of fields, but you must know your subset; this script automatically determines a subset that reduces the size of the resulting CSV export by about one-third:
originalNames <- names(input)
remove_names <- originalNames[c(grep("^date", originalNames),
grep("^_", originalNames))]
remove_names <- c(remove_names,
"splunk_server", "splunk_server_group",
"Label", "linecount", "punct", "eventtype")
subsetNames <- c("_time",
originalNames[!(originalNames %in% remove_names)])
output <- subset(input, select=subsetNames)
Then whenever I need to export Splunk results as CSV (for example, to use R to perform a beautiful scatter plot using ggplot2), I just add the following to the end of my Splunk query:
| r subset.r
Bottom line: Would you agree to update your R app to add the stringsAsFactors=FALSE argument to the read.csv call in the R app?
Ideally, I'd like an R app that would just be connected to the pipeline's stdout and require me to call read.csv, and then each of my R scripts could set the specific options it needed and avoid post-processing re-form the input back to what the script requires.
Just a thought. Thanks for your consideration.
I just created a new version of the R app that uses stringsAsFactors=FALSE for the read.csv call.
Plus, I updated the app's setup page allowing you to customize the read.csv and write.csv function call options.
I haven't uploaded the new version to apps.aplunk.con yet, because I want you to test it first. I uploaded the package here.
Thanks! I have installed your update (version 0.3.11) and verified that it works, both with my existing scripts, and then with my updated scripts.
I noticed your default stringsAsFactors=FALSE option in the r.conf file, and left it as-is.
I left the write.csv options empty, but it's nice to know there's a place to set them if and when a change is required, all with the existing R app.
Thank you again! We love it!