For Digital Forensics and Incident Response (DFIR) practitioners, Splunk is a core part of daily workflow. Its Schema on the Fly and powerful Search Processing Language (SPL) allow for iterative and flexible investigation—ideal for the nature of forensic analysis.
However, many users only interact with Splunk through its web interface. What if you want to document your investigative process? Add commentary? Re-run past searches with identical parameters? This is where the Splunk Search API becomes invaluable.
To demonstrate, we’ve published a walkthrough of an investigation using the Boss of the SOC v3 dataset. The entire analysis—from SPL query to result output—was scripted, executed, and rendered into an HTML report using code.
Curious? Let’s dive into how you can do the same.
Interacting with APIs often involves complexities like authentication, pagination, and rate limits. Microsoft Threat Intelligence Security Tools for Python (MSTICPy) abstracts away much of that pain.
Originally designed for threat hunting, MSTICPy supports multiple data sources including Splunk and Splunk Cloud. It allows you to submit SPL queries and get results back as tidy Pandas DataFrames—ready for further analysis, visualization, or transformation in your Python environment.
To get started, install MSTICPy via PyPI. If you’re using RStudio (as we’ll explain later), you may want to ensure compatibility by specifying the pandas version (e.g., 1.5.x). Here’s how we set up my environment using Miniforge on Windows:
```
> conda create -n msticpy python=3.10 pandas=1.5.3 pip notebook ipykernel
> conda activate msticpy
> pip install msticpy[splunk]
```
The walkthrough we mentioned earlier was rendered using Quarto, a next-generation scientific and technical publishing system. When combined with tools like RStudio or VS Code, Quarto lets you write Markdown enriched with executable code blocks—in R, Python, or even Bash.
Screenshot: Quarto document with inline R, Python and Bash code
Screenshot: Quarto document with inline R, Python and Bash code
Within a single .qmd file, you can mix prose, code, and live output. Upon rendering, you get a self-contained HTML file where each query and result is visible and reproducible.
Screenshot: HTML output generated by Quarto
For newcomers, we recommend starting with RStudio Desktop or RStudio Server, which natively support Quarto. By using the Reticulate package, you can access Python objects seamlessly from R.
Here’s how the connection looks in R using reticulate to import MSTICPy:
```
install.packages("pacman") # If the pacman package is not installed
pacman::p_load(tidyverse, reticulate)
mp <- import("msticpy")
qry_splunk <- mp$QueryProvider("Splunk")
qry_splunk$connect(host = "172.17.0.1", port = "8089",
username = "admin", password = "testpassword") # Not recommended!
```
Equivalent Python code:
```
import msticpy as mp
qry_splunk = mp.QueryProvider("Splunk")
qry_splunk.connect(host="172.17.0.1", port="8089",
username="admin", password="testpassword") # Not recommended!
``````
import msticpy as mp
qry_splunk = mp.QueryProvider("Splunk")
qry_splunk.connect(host="172.17.0.1", port="8089",
username="admin", password="testpassword") # Not recommended!
```
Note: Don’t hardcode credentials in production. Use msticpyconfig.yaml or API tokens.
Define your SPL query:
```
spl <- r"(
| inputlookup security_example_data.csv
| table timestamp threat_src_ip threat_dest_ip threat_status
| head 5
)"
```
Run it and store results as a DataFrame:
```
sample_df <- qry_splunk$exec_query(spl)
sample_df
```
Screenshot: DataFrame preview with search results
What’s convenient is that the resulting DataFrame can be referenced in a separate tab or window at any time. This makes it easy to revisit or reuse your data during further analysis.
Screenshot: Viewing the DataFrame in a separate tab
However, you may notice that the timestamp field is in Unix time format and stored as a string. In fact, when retrieving data via the Search API, all fields are returned as strings by default.
To improve readability, let's convert this to a proper datetime format. In R, this is easily done using a pipeline. We enjoy using the Tidyverse, which provides a consistent and expressive grammar for data manipulation—reminiscent of SPL’s pipe-based syntax.
```
sample_df |>
mutate(
timestamp = timestamp |> as.numeric() |> as_datetime()
)
```
Screenshot: Human-readable timestamps after conversion
Once your code and results are finalized in RStudio, click the “Render” button (or press Ctrl + Shift + K). Quarto compiles everything into a polished HTML report with embedded code and output.
Screenshot: Final HTML report with code and tables
Inspired by a 2018 talk by Masaru Nagaku on Literate Computing for Reproducible Infrastructure, we coined the term Literate Log Analysis to describe this approach. Thanks to MSTICPy and Quarto, we now have the tools to write forensic investigations as narratives—code, commentary, and context all in one place.
If this piques your interest, give it a try! And while you're at it, maybe take R for a spin too.
Happy Splunking!
Shintaro Watanabe is a seasoned cybersecurity professional specializing in incident response and information security planning. He is a Staff Engineer in the Information Security Division at JCOM Co., Ltd., Japan’s largest cable TV operator.
Beyond technical expertise, Shintaro is an effective communicator who aligns stakeholders and drives security improvements. He is active in organizations such as ICT-ISAC Japan, CRIC CSF, and JCTA, and holds numerous certifications including CISA, CISSP, and ten GIACs. He also teaches SANS SEC504 in Japan.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.