Reporting

Ratio Between Data Searched and Data Indexed

rlautman
Path Finder

Is it possible to measure the amount of data within an index that is being utilised by a search to compare the amount of data that has been indexed?

Tags (3)

bmacias84
Champion

So there are two tools I am going to point you to first being Search Job Inspector which allow you to see what your search is doing it breaking it down. The second is SoS (Splunk on Splunk) which uses splunk own diagnostic tools to analyze your Splunk enviornment. These tools will answer most of what you want, but I don't know what you are trying to accomplish.

The Search job inspector will give a scanCount,runDuration, resultCount, diskUsage, etc.

Remember Relational Database such as SQL and Splunk data repository are totally different. Splunk Indexes are made of raw compressed data with pointer files plus some metadata (index files) pointing to raw data, each index is then comprissed of HOT, WARM, COLD, and FROZEN buckets. Even though you have a 100gb index depending on your timeframe let say 15 min you probably will only search (scanning) your HOT BUCK for events. For example your HOT Bucket may only store the first 10GB of data with the other 75GB being contained within WARM and COLD. Depending on how your constructed your search you may be searching multiple indexes. Now if you you added metadata tags like index, host, source, etc you will quickly scan events and only return events matching your base search. You can dive deeper by looking into sparse, rare, and extermly rare searches.

I hope I am explaing this clear enough, though my numbers are just exaggerations. Hope this helps or gets you started. If an answer does help dont forget to accept and/or vote up answers.

Additional Reading:
- Howindexingworks
- Exploring Splunk Search Processing language (SPL) PRIMER and COOKBOOK

0 Karma

rlautman
Path Finder

The amount of data indexed compared to the amount of data returned by searches and, if possible, the amount of data within an index that have been queried - for example if there is 100gb stored in an index and 50gb of the index was queried by searches

0 Karma

bmacias84
Champion

What do you mean by utilized? Are you refering System resource being used by a Search vs an Index Or are you talking about the ammount of data being index and ammount of data being returned by a search?

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...