Splunk Enterprise

What are some best practices for creating large datasets for testing?

genesiusj
Builder

Hello, 

I need to generate a 1000+ records (5-10 fields) fake PII. What Best Practices, SPL, process have you designed to create via | makeresults or lookups/kvstores?

Thanks and God bless,

Genesius 

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Are you talking fake data that is then indexed or just used for lookup?

The eventgen app can do random data generation from a bunch of config files, but I prefer makeresults/collect/outputlookup as you have more control over data creation.

If you're just looking to create random faked PII data, then

  • makeresults
  • random
  • streamstats
  • split
  • mvrange
  • mvexpand
  • mvindex

are all good and can be used to create dummy data sets.

Random sampling from an existing data set can be done with 

| inputlookup data.csv
| eval v=random()
| sort v
| head 100
...

 

ITWhisperer
SplunkTrust
SplunkTrust

It depends on the nature of the faking. If you just want random values, makeresults generally works as a good place to start. If you want consistent values, makeresults could work if you have a deterministic way of generating the fake results, although csv if probably the better option.

Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...