What are some best practices for creating large da...

genesiusj · ‎12-14-2022

Hello,

I need to generate a 1000+ records (5-10 fields) fake PII. What Best Practices, SPL, process have you designed to create via | makeresults or lookups/kvstores?

Thanks and God bless,

Genesius

bowesmana · ‎12-14-2022

Are you talking fake data that is then indexed or just used for lookup?

The eventgen app can do random data generation from a bunch of config files, but I prefer makeresults/collect/outputlookup as you have more control over data creation.

If you're just looking to create random faked PII data, then

makeresults
random
streamstats
split
mvrange
mvexpand
mvindex

are all good and can be used to create dummy data sets.

Random sampling from an existing data set can be done with

| inputlookup data.csv
| eval v=random()
| sort v
| head 100
...

ITWhisperer · ‎12-14-2022

It depends on the nature of the faking. If you just want random values, makeresults generally works as a good place to start. If you want consistent values, makeresults could work if you have a deterministic way of generating the fake results, although csv if probably the better option.

What are some best practices for creating large datasets for testing?

using Splunk Enterprise

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!