All Apps and Add-ons

Splunk R App uses default stringsAsFactors=TRUE

brian_from_fl
Explorer

First, thanks for changing the read.csv call to leave the _time field name unmodified in version 0.3.10.

I have found cases where the input data frame's string factors cause problems, and have not yet found a case where I actually needed the strings as factors. I don't know if it's worth it to change the read.csv call to add stringsAsFactors=FALSE, but here's what I do to remove them and convert them back to character strings, in case anyone else wishes to know:

factor_fields <- sapply(input, is.factor)
input[factor_fields] <- lapply(input[factor_fields], as.character)

As a bonus, here is the contents of a simple R script (let's call it subset.r) that removes the Splunk fields that aren't terribly useful (such as raw and the date fields). The table command can be used to select a subset of fields, but you must know your subset; this script automatically determines a subset that reduces the size of the resulting CSV export by about one-third:

originalNames <- names(input)

remove_names <- originalNames[c(grep("^date", originalNames),
                                grep("^_", originalNames))]

remove_names <- c(remove_names,
                  "splunk_server", "splunk_server_group",
                  "Label", "linecount", "punct", "eventtype")

subsetNames <- c("_time",
                 originalNames[!(originalNames %in% remove_names)])

output <- subset(input, select=subsetNames)

Then whenever I need to export Splunk results as CSV (for example, to use R to perform a beautiful scatter plot using ggplot2), I just add the following to the end of my Splunk query:

| r subset.r

Bottom line: Would you agree to update your R app to add the stringsAsFactors=FALSE argument to the read.csv call in the R app?

Ideally, I'd like an R app that would just be connected to the pipeline's stdout and require me to call read.csv, and then each of my R scripts could set the specific options it needed and avoid post-processing re-form the input back to what the script requires.

Just a thought. Thanks for your consideration.

Tags (3)
0 Karma

rfujara_splunk
Splunk Employee
Splunk Employee

I just created a new version of the R app that uses stringsAsFactors=FALSE for the read.csv call.

Plus, I updated the app's setup page allowing you to customize the read.csv and write.csv function call options.

I haven't uploaded the new version to apps.aplunk.con yet, because I want you to test it first. I uploaded the package here.

0 Karma

brian_from_fl
Explorer

Thanks! I have installed your update (version 0.3.11) and verified that it works, both with my existing scripts, and then with my updated scripts.

I noticed your default stringsAsFactors=FALSE option in the r.conf file, and left it as-is.

I left the write.csv options empty, but it's nice to know there's a place to set them if and when a change is required, all with the existing R app.

Thank you again! We love it!

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...