I recently discovered that "tstats" is returning sourcetypes which do not exist.
Query:
| tstats values(sourcetype) where index=* by index
This returns a list of sourcetypes grouped by index. While it appears to be mostly accurate, some sourcetypes which are returned for a given index do not exist. For example, the sourcetype "WinEventLog:System" is returned for myindex, but the following query produces zero results:
index=myindex sourcetype="WinEventLog:System"
This is the case for multiple indexes.
If my understanding of "tstats" is correct, it works by only analyzing indexed fields which are stored in the tsidx files. If no events exist with a given sourcetype for a specific index, how could that value have possibly been saved in the tsidx files?
@HeavyHats - The very possible reason is the "rename" of props.conf
I know many Windows-related data has the rename attribute, for example, Sysmon data, Windows firewall logs from EventLogs. But this will be the issue anywhere where rename attribute it being used.
(Previously someone asked similar question - https://community.splunk.com/t5/Splunk-Search/Why-does-tstats-returns-events-by-sourcetype-but-searc...)
I hope this helps!!!
Are you sure noone fiddled with the TA_windows? Typically you'd see "XmlWinEventLog:System" as source, not sourcetype.
See my home Splunk instance:
Yes, that is part of the confusion here. "tstats" shows that "(Xml)WinEventLog:System" exists as a sourcetype, when it actually only exists as a source.
The only reason I could see of tstats and search showing different results is the rename attribute I mentioned in my answer.
Yeah if sourcetype is "WinEventLog:System" then you are using a very old version of the Add-on < 5.0.0
@HeavyHats - The very possible reason is the "rename" of props.conf
I know many Windows-related data has the rename attribute, for example, Sysmon data, Windows firewall logs from EventLogs. But this will be the issue anywhere where rename attribute it being used.
(Previously someone asked similar question - https://community.splunk.com/t5/Splunk-Search/Why-does-tstats-returns-events-by-sourcetype-but-searc...)
I hope this helps!!!
Where is a rename most likely to happen? (Universal Forwarder, Heavy Forwarder, Indexer, etc.). Our Universal Forwarders are not using a rename function in any props.conf files, and I've checked the heavy forwarder that these logs are passing through and it does not contain a rename function in any of its props.conf files either. I'm guessing this happens on the indexers?
rename is search time hence happens on the search head.
Thank you for the insight. I discovered that version 8.5.0 of the Splunk Add-on for Microsoft Windows (Splunk_TA_windows) contains rename statements in Splunk_TA_windows/default/props.conf:
## To provide backward compatibility for WinEventLog and XmlWinEventLog data
## These will be deprecated in future
[WinEventLog:Security]
rename = wineventlog
[WinEventLog:Application]
rename = wineventlog
[WinEventLog:System]
rename = wineventlog
...
This appears to be the source of this behavior. Marking your solution as accepted.
Glad you caught it!!!
Keep an eye out because many Add-ons use this, unfortunately (this makes it inconsistent between tstats and normal search).
I'm not buying this explanation. Rename works only one way - it only lets you search for a given sourcetype using a different name. It doesn't modify the returned results.
In order for you to have one value stored in the index (returned by tstats) and another calculated search-time you'd have to have some EVAL defined that would "cast" the value from source to sourcetype. Maybe someone did something like that while the windows TA changed it behaviour in order not to rework searches done for old values.
Can you confirm if
| tstats values(sourcetype) where index=myindex
and
index=myindex
| stats count by sourcetype
produce identical looking sourcetypes for the same time range.
If they are giving different sourcetypes for that one 'myindex' example you gave that's odd. Could it be that the sourcetype
sourcetype="WinEventLog:System"
has some leading trailing characters, e.g. space?
If you do the search with wildcards
index=myindex sourcetype="*WinEventLog:System*"
does that also give no results?
I can confirm that the first two queries do not produce identical lists. There is about 90% overlap, but each list contains entries which are absent from the other list.
I can also confirm that there is no leading/trailing whitespace. The last query produces no results.
Other than a permissions/security issue that is constraining what you can see in one variant as opposed to the other, I don't have any other suggestions 😞
I believe this is because the tstats command performs statistical queries on indexed fields in tsidx files.
Some time ago the Windows TA was changed in version 5.0.1 of the Windows TA.
See: Sourcetype changes for WinEventLog data
This means all old sourcetypes that used to exist (and where indexed!) where named for example WinEventLog:System or WinEventLog:Application or WinEventLog:Security. They all have been renamed to WinEventLog by the newer version of Windows TA.
But since they where indexed in the past they still exists in the metadata. And since tstats only looks at the indexed metadata you see these old sourcetypes appear.
If I'm limiting my search to the past 24 hours though, shouldn't tstats respect the time limit and not evaluate older data?
It doesn't work that way. A value of an indexed field is just a value. If you extract different value of _time for each event you don't expect the old ones to get "renamed" do you?
So that's not the cause.
Either there is some dynamic renaming in search-time happening as @VatsalJagani suggested or the index file is simply corrupted and for some reason "overlaps" source with sourcetype (or vice versa).