Splunk Enterprise

What are some best practices for creating large datasets for testing?

genesiusj
Builder

Hello, 

I need to generate a 1000+ records (5-10 fields) fake PII. What Best Practices, SPL, process have you designed to create via | makeresults or lookups/kvstores?

Thanks and God bless,

Genesius 

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Are you talking fake data that is then indexed or just used for lookup?

The eventgen app can do random data generation from a bunch of config files, but I prefer makeresults/collect/outputlookup as you have more control over data creation.

If you're just looking to create random faked PII data, then

  • makeresults
  • random
  • streamstats
  • split
  • mvrange
  • mvexpand
  • mvindex

are all good and can be used to create dummy data sets.

Random sampling from an existing data set can be done with 

| inputlookup data.csv
| eval v=random()
| sort v
| head 100
...

 

ITWhisperer
SplunkTrust
SplunkTrust

It depends on the nature of the faking. If you just want random values, makeresults generally works as a good place to start. If you want consistent values, makeresults could work if you have a deterministic way of generating the fake results, although csv if probably the better option.

Get Updates on the Splunk Community!

OpenTelemetry for Legacy Apps? Yes, You Can!

This article is a follow-up to my previous article posted on the OpenTelemetry Blog, "Your Critical Legacy App ...

UCC Framework: Discover Developer Toolkit for Building Technology Add-ons

The Next-Gen Toolkit for Splunk Technology Add-on Development The Universal Configuration Console (UCC) ...

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...