What is the definition of large? Is it measured in total bytes? Number of records? And in either case how much?
The definition of "large" in the context of data typically depends on the specific environment and use case you're considering. In Splunk, large datasets can be assessed by various metrics, including total bytes ingested, the number of events, or records processed.
I did a talk in 2020 about scaling to 7.5TB, imagine how much it has scaled since then 😉 There are many Splunk users running much much bigger instances than we had too..
https://conf.splunk.com/files/2020/slides/PLA1180C.pdf
For best practices in handling large datasets, review Splunk's documentation on scaling and optimizing your deployment.
In Splunk, "large" can refer to total data ingestion (typically 100-150 GB per indexer per day), number of events (millions per day, but volume matters more), or individual event size (Splunk handles up to 100,000 bytes per event with limits on segments). High ingestion rates, oversized events, and excessive indexing can impact performance. Regular monitoring and optimization are essential for efficient data management.