Hello Splunker,
Hope you had a great day!
as per the below picture :
Q1:- I need to understand the exact process of creating the TSIDX file and its content and how actually it speeds the search?
Q2:- Why the size of the tsidx file is bigger than the raw data itself 35% /15%?
Q3:- what is the difference between tsidx file and datamodel summary?
I am expecting a long answer and more details, actually i like details!
Thanks in advance!
Hi @AliMaher,
Archived .conf content is a great place to start. Behind The Magnifying Glass: How Search Works by Jeff Champagne provides a nice overview, and TSTATS and PREFIX by Richard Morgan is fantastic.
Try searching conf.splunk.com using your favorite search engine for the term tsidx, e.g. using Google:
https://www.google.com/search?q=site%3Aconf.splunk.com+tsidx
Q1. You don't need to understand the internal structure of the tsidx but it's useful to know what is being indexed and how it's being used in searching - it helps writing efficient searches.
Q2. Raw data is stored in compressed (gzipped if I remember correctly) form - hence the low footprint of "raw" data - 1/9th is the typical size of gzipped textual data.
Q3. That's a question akin to "what's a difference between a sports car and a red one" (apart from the obvious fact that red ones are the fastest ones 😉). But seriously. Tsidx is a file format used by Splunk to store its internal structures. Datamodel summary is a concept on a completely different level. In fact datamodel summaries are stored using tsidx files.
Great!
what is the datamodel summarization?
In case of datamodel it's called acceleration. It's a process which runs a scheduled search extracting fields from datamodel data and indexing them in tsidx summary files for efficient searching later.