I am wondering, when it is useful or reasonable to create a new index or sourcetype.
If I have data that I want to analyse for one topic, i would upload it all to the same index, as well as to the same sourcetype.
For a different topic, which is not related to the first one, I would create a different index with a different sourctype.
e.g. index=Project1, sourcetype=Project1csv
Is there an advantage to split data to more indexes or more sourcetypes?
Thanks for your advice.
there are 3 good reasons for creating new indexes:
sourcetype: you create a sourcetype for every different logsource. meaning: if you analyse the same source types (apache, iis, oraclelogs) in your different projects you should use the same sourcetype. extractions or lookups are defined on a per sourcetype basis.
A separate index is useful if you have different retention policies, i.e. some data that has to be stored for years and some data that only needs to be available for months. Another advantage is that you can control which users access your data, i.e. exclude some users from certain content.
Maybe another less important distinction is that you can store different indexes in different places, so if you have some data that you frequently need immediately you place it on fast responding systems, while an index with data you don't need that often (or response time is not that crucial) can be stored on a slower, less expensive system.
(A typical example for an additional index is one for temporary data, i.e. if you want to see how everything works out before you index data on your main productive index.)
Sourcetypes serve very important functions, for starters see here. For example, you can define how splunk indexes your data based on them. Furthermore, you can set transformations or field extractions per sourcetype. And because sourcetypes usually share the same format, you can do the same things with them no matter where they come from (for example, any event with the sourcetype of a webserver/router/firewall log will allow you to draw the same conclusions from them, no matter which server/router/firewall you look at). They provide a high-level (logical) abstraction.
This is a very subjective question depends very much on the data requirements and env on your side . That being said, a few key items to consider would be:
1. Data retention: Data aging is defined at the index level. If you have two sets of data needing different handling from a retention perspective (consider even from hot warm/cold standpoint).
2. Data sizing/Hardware resources: Depending on you hardware, there should be an optimal size for you index. You do want indexes to be too large from a purely storage standpoint. So if you are short on storage and are service multiple teams ... setting up different indexes does off a good to accommodate all.
3. Search overhead: Theoretically the searches would be faster on smaller indexes than on very large indexes. If there is a requirement for certain data to be available for high /quick searches. Might be better not to mingle this data with other indexes.
Pretty sure there are other benefits and view points which I hopes other will point out as well. But to answer question (assuming your topics are broad), yes there are advantages to splitting your data into separate indexes and sourcetypes.
Thanks for your answers. Is it also possible to delete sourcetypes? If so, what happens to the data of that sourcetype?