Splunk Enterprise

What are some best practices for creating large datasets for testing?

genesiusj
Builder

Hello, 

I need to generate a 1000+ records (5-10 fields) fake PII. What Best Practices, SPL, process have you designed to create via | makeresults or lookups/kvstores?

Thanks and God bless,

Genesius 

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Are you talking fake data that is then indexed or just used for lookup?

The eventgen app can do random data generation from a bunch of config files, but I prefer makeresults/collect/outputlookup as you have more control over data creation.

If you're just looking to create random faked PII data, then

  • makeresults
  • random
  • streamstats
  • split
  • mvrange
  • mvexpand
  • mvindex

are all good and can be used to create dummy data sets.

Random sampling from an existing data set can be done with 

| inputlookup data.csv
| eval v=random()
| sort v
| head 100
...

 

ITWhisperer
SplunkTrust
SplunkTrust

It depends on the nature of the faking. If you just want random values, makeresults generally works as a good place to start. If you want consistent values, makeresults could work if you have a deterministic way of generating the fake results, although csv if probably the better option.

Get Updates on the Splunk Community!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...

What’s New in Splunk Observability – September 2025

What's NewWe are excited to announce the latest enhancements to Splunk Observability, designed to help ITOps ...

Fun with Regular Expression - multiples of nine

Fun with Regular Expression - multiples of nineThis challenge was first posted on Slack #regex channel ...