I have huge amount of data to be indexed and all of them are being indexed with 1 log format (sourcetype) but suddenly, in few lines splunk suddenly switches to another sourcetype for that particular number of lines only and when I looked through the log data they had exactly same format that other had.
so now, I need to access the internal logs of splunk to identify what happened during indexing data into splunk's database. i know that splunk stores its own log files in $splunk_home/var/log/splunk but what i cant find is the log file specifically related to indexing process.
great clarifications... following nick's answer then kristian's update, yes the sourcetype is actually same name followed by a number (not -too-small since is not small).
but weired part is that that happens actually from middle of 2 files only (from tail of one file until head of another file is just the other sourcetype)
and it is CSV but I recently had to change all "," to tab-separated delimiters, and im trying to re-index, will give u real examples if it happenes again in this indexing...
What is the other sourcetype. is it maybe of the form
<filename>_too_small? If so, then every now and then one of the files being auto-sourcetyped is just too short for the auto-sourcetyping to work correctly.
thanks for your comprehensive answer. the sourcetype is auto-assigned actually...
ok I have to try your says and see what happens. but for now, all i can say is that the log files are some comma separated events, and there are absolutely no difference in format between those in first sourcetype and those in second...
most likely, Splunks own logs are also indexed in the
_internal index. You can search it just like the other indexes. However, there is a possibility that whatever happened to your log parsing/indexing has not been logged by Splunk.
You may have to change the logging level in order to see this, e.g. from WARN to INFO or DEBUG. This is done in Manager -> System Settings -> System Logging. Unfortunately I don't know just which of the 400+ items should be changed.
On a side note, did you specify a sourcetype in your inputs.conf (or via the GUI), or did Splunk auto-assign it?
Also, a bit more information regarding the sourcetypes involved, along with some sample data would be good.
As nick points out, if the new sourcetype is ...-too-small, then the file in question is too short for Splunks auto-sourcetyping to work properly.
If the new sourcetype is a "numbered" version of the original sourcetype, e.g. iis-2 or iis-3, means that Splunk thinks that it's the same format, but slightly different. This can happen for CSV log files where the header row changes. By default, I believe that Splunk expects a header row for CSV files.
I guess that this problem of yours only occurs on a per-file basis and not in the middle of a file, i.e. some of your files get indexed as the "wrong" sourcetype, but most do not.
Please provide the first three rows of
a) a correctly sourcetyped file and
b) an incorrectly sourcetyped file.
Don't forget to mask IP/usernames/hostnames as needed.
Hope this helps,