All Apps and Add-ons

Splunk R App uses default stringsAsFactors=TRUE

brian_from_fl
Explorer

First, thanks for changing the read.csv call to leave the _time field name unmodified in version 0.3.10.

I have found cases where the input data frame's string factors cause problems, and have not yet found a case where I actually needed the strings as factors. I don't know if it's worth it to change the read.csv call to add stringsAsFactors=FALSE, but here's what I do to remove them and convert them back to character strings, in case anyone else wishes to know:

factor_fields <- sapply(input, is.factor)
input[factor_fields] <- lapply(input[factor_fields], as.character)

As a bonus, here is the contents of a simple R script (let's call it subset.r) that removes the Splunk fields that aren't terribly useful (such as raw and the date fields). The table command can be used to select a subset of fields, but you must know your subset; this script automatically determines a subset that reduces the size of the resulting CSV export by about one-third:

originalNames <- names(input)

remove_names <- originalNames[c(grep("^date", originalNames),
                                grep("^_", originalNames))]

remove_names <- c(remove_names,
                  "splunk_server", "splunk_server_group",
                  "Label", "linecount", "punct", "eventtype")

subsetNames <- c("_time",
                 originalNames[!(originalNames %in% remove_names)])

output <- subset(input, select=subsetNames)

Then whenever I need to export Splunk results as CSV (for example, to use R to perform a beautiful scatter plot using ggplot2), I just add the following to the end of my Splunk query:

| r subset.r

Bottom line: Would you agree to update your R app to add the stringsAsFactors=FALSE argument to the read.csv call in the R app?

Ideally, I'd like an R app that would just be connected to the pipeline's stdout and require me to call read.csv, and then each of my R scripts could set the specific options it needed and avoid post-processing re-form the input back to what the script requires.

Just a thought. Thanks for your consideration.

Tags (3)
0 Karma

rfujara_splunk
Splunk Employee
Splunk Employee

I just created a new version of the R app that uses stringsAsFactors=FALSE for the read.csv call.

Plus, I updated the app's setup page allowing you to customize the read.csv and write.csv function call options.

I haven't uploaded the new version to apps.aplunk.con yet, because I want you to test it first. I uploaded the package here.

0 Karma

brian_from_fl
Explorer

Thanks! I have installed your update (version 0.3.11) and verified that it works, both with my existing scripts, and then with my updated scripts.

I noticed your default stringsAsFactors=FALSE option in the r.conf file, and left it as-is.

I left the write.csv options empty, but it's nice to know there's a place to set them if and when a change is required, all with the existing R app.

Thank you again! We love it!

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...