What are some best practices for creating large da...

genesiusj · ‎12-14-2022

Hello,

I need to generate a 1000+ records (5-10 fields) fake PII. What Best Practices, SPL, process have you designed to create via | makeresults or lookups/kvstores?

Thanks and God bless,

Genesius

bowesmana · ‎12-14-2022

Are you talking fake data that is then indexed or just used for lookup?

The eventgen app can do random data generation from a bunch of config files, but I prefer makeresults/collect/outputlookup as you have more control over data creation.

If you're just looking to create random faked PII data, then

makeresults
random
streamstats
split
mvrange
mvexpand
mvindex

are all good and can be used to create dummy data sets.

Random sampling from an existing data set can be done with

| inputlookup data.csv
| eval v=random()
| sort v
| head 100
...

ITWhisperer · ‎12-14-2022

It depends on the nature of the faking. If you just want random values, makeresults generally works as a good place to start. If you want consistent values, makeresults could work if you have a deterministic way of generating the fake results, although csv if probably the better option.

What are some best practices for creating large datasets for testing?

using Splunk Enterprise

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

What are some best practices for creating large datasets for testing?

using Splunk Enterprise

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases