Splunk Enterprise

What are some best practices for creating large datasets for testing?

genesiusj
Builder

Hello, 

I need to generate a 1000+ records (5-10 fields) fake PII. What Best Practices, SPL, process have you designed to create via | makeresults or lookups/kvstores?

Thanks and God bless,

Genesius 

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Are you talking fake data that is then indexed or just used for lookup?

The eventgen app can do random data generation from a bunch of config files, but I prefer makeresults/collect/outputlookup as you have more control over data creation.

If you're just looking to create random faked PII data, then

  • makeresults
  • random
  • streamstats
  • split
  • mvrange
  • mvexpand
  • mvindex

are all good and can be used to create dummy data sets.

Random sampling from an existing data set can be done with 

| inputlookup data.csv
| eval v=random()
| sort v
| head 100
...

 

ITWhisperer
SplunkTrust
SplunkTrust

It depends on the nature of the faking. If you just want random values, makeresults generally works as a good place to start. If you want consistent values, makeresults could work if you have a deterministic way of generating the fake results, although csv if probably the better option.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...