Re: Index vs Sourcetype - What's faster

chandlercr · ‎08-28-2019

I am curious, does including an index help the search any when writing a search?

This comes about as me and a friend are arguing over whether or not one is more necessary over the other. For example, lets say I do a search with just a Sourcetype and then on another search I include an Index.

While I know this "limits" the data, Splunk still has to search data either way. Would including the Index in this case cause for any substantial gain in the effectiveness of the search, or could leaving it out be just as effective as I am specifying a certain index. What are your thoughts?

diogofgm · ‎08-28-2019

You should use both whenever possible. The more precise you are with you search the faster you'll get your results because splunk might be able to look into a smaller amount of data to retrieve what you are looking for.

if you specify just the sourcetype splunk will need to check every index you have access to for that sourcetype to retrieve you data. Also depending on your configuration you might not get all data for that sourcetype. In user roles you can set a "default indexes to search". If e.g. your roles have it set to search just index main by default, using just sourcetype will on search the main index regardless of that sourcetype being present in other indexes.
if you specify just the index splunk will look into more data instead of looking just into the buckets containing the sourcetype you're looking for.

Also both index and sourcetype (along with host, source and _time) are indexed during index_time which means finding data using any of these fields will get you results quite fast. You can even use the |tstats command to benefit from these indexed fields (and others in case you're doing indexed_extractions).

say you want to know which hosts are sending data to a specific index you could search:
index = blah | stats count by host
OR
| tstats count where index=blah by host

Just give it a try in a big index and check the diference in time taken to complete the search.

If you want to know more about search best practices and how to write really performant search look for search related presentations from conf.

https://conf.splunk.com/files/2016/slides/behind-the-magnifying-glass-how-search-works.pdf
https://conf.splunk.com/files/2016/slides/search-optimization.pdf

------------
Hope I was able to help you. If so, some karma would be appreciated.

Sukisen1981 · ‎08-28-2019

there are 2 aspects here-
sourcetype names need not be unique, for example theoretically I can upload any csv with sourcetype as csv across indexes. So if i search for sourcetype csv it will then search ALL such sourcetypes
BUT
when i add index="aaaa" sourcetype= csv it will search for csv sourcetypes ONLY inside the index aaa.Indexes are unique in nature.

In real life there are many instances when sourcenames will overlap , say for _json or cisco or catalina.
Inclusion is always better than exclusion or not not specifying a more exact match 🙂
I rest my case

Index vs Sourcetype - What's faster

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

Index vs Sourcetype - What's faster

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...