Just a question for anyone that may know the answer… Does Splunk have a listing of what is considered to be a standard index name and sourcetypes? I would think Splunk must maintain a list of Index names that get matched up with Splunk supported Apps. So for example something like this:
Index_Name What data is expected Splunk Supported App
os unix syslog UNIX
vmware esx, vmware Vmware
I keep finding out the hard way that naming the Index right the first time saves a lot of time instead of trying to reconfigure App’s after the fact to point at the right index.
Just didn’t want to reinvent the wheel if this data if available.
This is a very difficult question to answer and there is not right or wrong anwser for this.
Index naming is highly variable and depends on data volume, number of different/unique sources, retention requirements, and total storage. In general you should store like sources with like sources and not all data is created equally. Some data may have analytical uses for APM or capacity while other source only useful in research (root cause). Then you may have applications which are of different formats, but don't have the volume to warrant a new index. In a one case you may separate perfmon index called perfmon and all other Windows Logs go into an index called WIN. Each index configured with its own retention, size limits, and backup strategy.
Your WIN index could contain the following windows logs: Firewall, updates, EVENTLOG-Security, EVENTLOG-Application, EVENTLOG-System, EVENTLOG-DNS, EVENTLOG-AD, etc. Let say EVENTLOG-System only needs to be retained for only 30 days for research while EVENTLOG-Security needs to be retained 3 years for compliance reasons. While in other case you many create a WEBLOGS generic index containing Apache Access logs and IIS logs. Using multiple indexes will also allow for segmentation based or security requirements as well. Performance is another reason for multiple indexes. EXAMPLE: Suppose you have 1k of M&Ms and 100k Skittles. Then you want to find all the yellow M&M’s. It’s easier to find them if you separate them in to separate jar (index). - taken from splunk conf 2012.
In a small deployment you might have something like this for indices:
WIN PERFMON IIS NIX APACHE SNMP
In another you might have the following because of volume, security, retention policies.
WIN WIN-SYSTEM WIN-SECURITY WIN-APPLICATION IIS-EXTERNAL IIS-INTERNAL ACCESSLOGS NIX SNMP FIREWALL
Regarding sourcetypes this depends on log format standardization. In environment where all IIS logs have the same fields and delimiters you could a single sourcetype call W3C; however if you have different fields and delimiters you may end up with W3C-1, W3C-2, W3C-3 each having specific field extractions because of the format.
Another case is many applications log in csv and csv is a possible sourcetype. Log data may csv be but the fields are specific to each application so you probably want to classify it as mobilemetric, typeperf, etc. so you can create custom field extraction for each application log.
You might end up with source types like
OFX SNMP W3C SYSLOG CISCO-FIREWALL MOBILEMETRICS JSTAT JMX GCC
Since source and sourcetype are contained in the Meta data you improve performance of your searches.
I would recommend sitting down and map out our data source and inputs by host, server function, application/technology. This will allow you to build a Common Information Model (CIM).
Sourcetype Naming conventions:
- No right or wrong answer
- make them usable and understandable, with plans on the future
- categorize data, device type, vendor, appliction.
- Develop a CIM relivant to you environment.
I guess my main point is that if someone is setting up Splunk for the enterprise and building Splunk from the ground up and creating index names there should be a standard name list to match Splunk Supported Apps. Just makes life a little easier. When you looks at the Splunk Supported UNIX app you need to change in multiple places to get it to work when you name your index for UNIX other than "os". Some Apps its simple. The UNIX app has search links that point at index=os all over the place.
Maybe the answer is building Standard Apps that require minimial changes.