Archive

Ratio Between Data Searched and Data Indexed

Path Finder

Is it possible to measure the amount of data within an index that is being utilised by a search to compare the amount of data that has been indexed?

Tags (3)

Champion

So there are two tools I am going to point you to first being Search Job Inspector which allow you to see what your search is doing it breaking it down. The second is SoS (Splunk on Splunk) which uses splunk own diagnostic tools to analyze your Splunk enviornment. These tools will answer most of what you want, but I don't know what you are trying to accomplish.

The Search job inspector will give a scanCount,runDuration, resultCount, diskUsage, etc.

Remember Relational Database such as SQL and Splunk data repository are totally different. Splunk Indexes are made of raw compressed data with pointer files plus some metadata (index files) pointing to raw data, each index is then comprissed of HOT, WARM, COLD, and FROZEN buckets. Even though you have a 100gb index depending on your timeframe let say 15 min you probably will only search (scanning) your HOT BUCK for events. For example your HOT Bucket may only store the first 10GB of data with the other 75GB being contained within WARM and COLD. Depending on how your constructed your search you may be searching multiple indexes. Now if you you added metadata tags like index, host, source, etc you will quickly scan events and only return events matching your base search. You can dive deeper by looking into sparse, rare, and extermly rare searches.

I hope I am explaing this clear enough, though my numbers are just exaggerations. Hope this helps or gets you started. If an answer does help dont forget to accept and/or vote up answers.

Additional Reading:
- Howindexingworks
- Exploring Splunk Search Processing language (SPL) PRIMER and COOKBOOK

0 Karma

Path Finder

The amount of data indexed compared to the amount of data returned by searches and, if possible, the amount of data within an index that have been queried - for example if there is 100gb stored in an index and 50gb of the index was queried by searches

0 Karma

Champion

What do you mean by utilized? Are you refering System resource being used by a Search vs an Index Or are you talking about the ammount of data being index and ammount of data being returned by a search?