Getting Data In

how does the splunk GUI calculate the size of indexes and data models associated with the index ?

_pravin
Contributor

Hi Community,

 

We have this wierd situation where one of the newest splunk installs (3 months old) went out of space - the capacity of the server was 500GB.

When I checked the size of each ondex in GUI, the size were all under limit. The sum of all were under 250 Gb, which made sense as the size of all index is set to 500GB (default). But when I calculated the size of the data models associated with the index, I could see that the data models had used almost 250Gb. 

My understanding was that the data models should be also be included under the index capacity, but it seemed be exceeding the limits.

 

Can anyone please throw some light on this topic?

 

Regards,

Pravin

Labels (2)
Tags (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

DMA data is stored in same location (by default) as the index the accelerated data came from, but is not included in the index size so is not covered by index size limits.  When sizing an index, one should leave room on the storage device for DMA or use the tstatsHomePath setting in indexes.conf to put DMA output elsewhere.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

_pravin
Contributor

Hi @gcusello  and @richgalloway,

 

One final question to get clarity about data models.

Let's assume I have an index that has data retention time of 1 month and a data model acceleration summary for 3 months. How will the data model act in this case.

Will data models have accelerated data that goes until 3 months or will the data models drop the data once the index drops them?

 

Regards,

Pravin

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @_pravin 

if you have a minor retention of your data, you can search data on the data model, but if you want to have a drilldown on raw data, it's possible only for a minor period.

Usually it's the contrary: search on data model on a minor or equal period than raw.

Ciao.

Giuseppe 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Data models don't hold data past its retention period.  To do that, use a summary index.

---
If this reply helps you, Karma would be appreciated.

gcusello
SplunkTrust
SplunkTrust

Hi @_pravin,

no Data Models are calculated in a separated way and, as @richgalloway said, they could be in a different location and have a different retention.

If you'r data Models use the same space of the index, probably you used in the Data Model also the _raw, and it isn't a best practice, because in the Data Model, you should have only the fields you need for your searches, not all the _raw.

Usually the space occupation for one year of an accelerated DataModel is around the daily license consuption for that index moltiplicated for 3.4.

Ciao.

Giuseppe

_pravin
Contributor

Hi @gcusello ,

 

Our datamodels don't use the same space as in the index so the accelerated data don't have a cap on the limit.

I really liked your extended answer but could you please explain the line below in quotes, I find it a bit confusing.

"Usually the space occupation for one year of an accelerated DataModel is around the daily license consuption for that index moltiplicated for 3.4."

 

Regards,

Pravin

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @_pravin ,

the disk space used for accelerated Data Models is usually calculated with this formula:

disk_space = dayly_used_license * 3.4

this formula is described in the Splunk Architecting training course.

So it's very strange that you have 250GB of index and 250 GB of Data Model.

This is possible only if you configured in your Data Model also the _raw field and this isn't a best practice becase in a Data Model you should have only the fields requested in your searches, not all the _raw of all events.

Ciao.

Giuseppe

_pravin
Contributor

Sorry, I meant to say that the size of indexes (index1, index2, index 3, and so on) all together sums upto 250 GB. But the sizing case with datamodels was 250 Gb for 1 on them, 11GB of another, some megabytes for the next one., and so on.

Actually, the datamodel has only the requested field accelrated but the summary range is 1 year. This obviously makes sense for the growing size of data models.

 

Thanks,

Pravin

0 Karma

richgalloway
SplunkTrust
SplunkTrust

DMA data is stored in same location (by default) as the index the accelerated data came from, but is not included in the index size so is not covered by index size limits.  When sizing an index, one should leave room on the storage device for DMA or use the tstatsHomePath setting in indexes.conf to put DMA output elsewhere.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...