Splunk Search

what is large

splunkermack
New Member

What is the definition of large? Is it measured in total bytes? Number of records? And in either case how much?

Labels (1)
0 Karma

livehybrid
Influencer

The definition of "large" in the context of data typically depends on the specific environment and use case you're considering. In Splunk, large datasets can be assessed by various metrics, including total bytes ingested, the number of events, or records processed.

I did a talk in 2020 about scaling to 7.5TB, imagine how much it has scaled since then 😉 There are many Splunk users running much much bigger instances than we had too..

https://conf.splunk.com/files/2020/slides/PLA1180C.pdf

  1. Total Bytes: In many scenarios, a dataset exceeding several terabytes can be considered large. However, this threshold can vary depending on your Splunk architecture and the capabilities of your infrastructure (e.g., indexers, storage, etc.).
  2. Number of Records: Similarly, datasets with millions to billions of records can also be categorized as large. The exact limit often depends on the performance characteristics of your Splunk deployment, such as your hardware capacity and the intended use of the data.
  3. Performance Considerations: When assessing whether a dataset is large, consider the impact on performance. Large datasets may affect indexing speed, search performance, and dashboard loading times. It's essential to monitor how your infrastructure handles data volume and adjust your architecture as necessary to ensure efficiency. Ultimately, defining "large" is subjective and should be based on specific business requirements, performance metrics, and the context of your Splunk implementation.

For best practices in handling large datasets, review Splunk's documentation on scaling and optimizing your deployment.

0 Karma

kiran_panchavat
Influencer

@splunkermack 

In Splunk, "large" can refer to total data ingestion (typically 100-150 GB per indexer per day), number of events (millions per day, but volume matters more), or individual event size (Splunk handles up to 100,000 bytes per event with limits on segments). High ingestion rates, oversized events, and excessive indexing can impact performance. Regular monitoring and optimization are essential for efficient data management.

I hope this helps, if any reply helps you, you could add your upvote/karma points to that reply, thanks.
0 Karma