Splunk Enterprise

What are some best practices for creating large datasets for testing?

genesiusj
Builder

Hello, 

I need to generate a 1000+ records (5-10 fields) fake PII. What Best Practices, SPL, process have you designed to create via | makeresults or lookups/kvstores?

Thanks and God bless,

Genesius 

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Are you talking fake data that is then indexed or just used for lookup?

The eventgen app can do random data generation from a bunch of config files, but I prefer makeresults/collect/outputlookup as you have more control over data creation.

If you're just looking to create random faked PII data, then

  • makeresults
  • random
  • streamstats
  • split
  • mvrange
  • mvexpand
  • mvindex

are all good and can be used to create dummy data sets.

Random sampling from an existing data set can be done with 

| inputlookup data.csv
| eval v=random()
| sort v
| head 100
...

 

ITWhisperer
SplunkTrust
SplunkTrust

It depends on the nature of the faking. If you just want random values, makeresults generally works as a good place to start. If you want consistent values, makeresults could work if you have a deterministic way of generating the fake results, although csv if probably the better option.

Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...