Getting Data In

When to add a new index or sourcetype?

Path Finder

Hello 🙂

I am wondering, when it is useful or reasonable to create a new index or sourcetype.
If I have data that I want to analyse for one topic, i would upload it all to the same index, as well as to the same sourcetype.
For a different topic, which is not related to the first one, I would create a different index with a different sourctype.

e.g. index=Project1, sourcetype=Project1_csv
index=Project2, sourcetype=Project2_csv

Is there an advantage to split data to more indexes or more sourcetypes?

Thanks for your advice.
Silvia

0 Karma
1 Solution

Contributor

Hi,

there are 3 good reasons for creating new indexes:

  1. security - you grant readpermission to indexes for your user roles. if there is data a usergroup should not be able to see - you have to create a new index for that.
  2. retention times - sometimes you want data to fade you or get stored on a cheaper filesystem. this is a per index setting
  3. data type if you query data from one index which is blown by 99,99% with other data it makes sense to create another index

sourcetype: you create a sourcetype for every different logsource. meaning: if you analyse the same source types (apache, iis, oraclelogs) in your different projects you should use the same sourcetype. extractions or lookups are defined on a per sourcetype basis.

Regards,

Andreas

View solution in original post

Path Finder

Thanks for your answers. Is it also possible to delete sourcetypes? If so, what happens to the data of that sourcetype?

0 Karma

Contributor

Hi Silvia,
This is a very subjective question depends very much on the data requirements and env on your side . That being said, a few key items to consider would be:

Indexes:
1. Data retention: Data aging is defined at the index level. If you have two sets of data needing different handling from a retention perspective (consider even from hot warm/cold standpoint).
2. Data sizing/Hardware resources: Depending on you hardware, there should be an optimal size for you index. You do want indexes to be too large from a purely storage standpoint. So if you are short on storage and are service multiple teams ... setting up different indexes does off a good to accommodate all.
3. Search overhead: Theoretically the searches would be faster on smaller indexes than on very large indexes. If there is a requirement for certain data to be available for high /quick searches. Might be better not to mingle this data with other indexes.

Sourcetypes:

  1. Sourcetype is one of the options you can use in props.conf to define multiple data configurations. Check out props.conf . So just from hindsight perspective it might be worthwhile not to bundle all data together just in the interest of future flexibility.
  2. If the dataformats are different (eg, access and error logs) , having them in separate sourcetypes will better organize your data within splunk and help with above point in terms of giving you the flexibility to operate on each format differently. Example you can define lookups just on access logs and not on error logs, field extractions just on error logs etc.
  3. 2. Also sourcetypes help narrow down your searches. index=x sourcetype=y will be a faster search than index=x.

Pretty sure there are other benefits and view points which I hopes other will point out as well. But to answer question (assuming your topics are broad), yes there are advantages to splitting your data into separate indexes and sourcetypes.

Champion

A separate index is useful if you have different retention policies, i.e. some data that has to be stored for years and some data that only needs to be available for months. Another advantage is that you can control which users access your data, i.e. exclude some users from certain content.
Maybe another less important distinction is that you can store different indexes in different places, so if you have some data that you frequently need immediately you place it on fast responding systems, while an index with data you don't need that often (or response time is not that crucial) can be stored on a slower, less expensive system.
(A typical example for an additional index is one for temporary data, i.e. if you want to see how everything works out before you index data on your main productive index.)

Sourcetypes serve very important functions, for starters see here. For example, you can define how splunk indexes your data based on them. Furthermore, you can set transformations or field extractions per sourcetype. And because sourcetypes usually share the same format, you can do the same things with them no matter where they come from (for example, any event with the sourcetype of a webserver/router/firewall log will allow you to draw the same conclusions from them, no matter which server/router/firewall you look at). They provide a high-level (logical) abstraction.

0 Karma

Path Finder

Also thank you! Now I understand how to use sourcetype.

0 Karma

Contributor

Hi,

there are 3 good reasons for creating new indexes:

  1. security - you grant readpermission to indexes for your user roles. if there is data a usergroup should not be able to see - you have to create a new index for that.
  2. retention times - sometimes you want data to fade you or get stored on a cheaper filesystem. this is a per index setting
  3. data type if you query data from one index which is blown by 99,99% with other data it makes sense to create another index

sourcetype: you create a sourcetype for every different logsource. meaning: if you analyse the same source types (apache, iis, oraclelogs) in your different projects you should use the same sourcetype. extractions or lookups are defined on a per sourcetype basis.

Regards,

Andreas

View solution in original post

Path Finder

Ok thank you, that helps! 🙂

0 Karma