Eventcount retrieving different numbers of events ...

Noctisae · ‎11-04-2024

First of all, English isn't my native language, so I apologize in advance for any error I could write in this support topic.

I encounter a problem I'm a bit lost with : I'm indexing a lot of different data with different sourcetypes (mostly CSV and JSON data, but with a bit of unstructured data here and there), and the eventcount and tstats commands are returning a whole lot different count of events. I know the eventcount command doesn't care about the time window, so I tried increasing the time window in the future until the maximum supported by Splunk, but to no avail.

To talk numbers, in my instance the command "eventcount index=XXX* " returns a number of 160 millions events in my indexes. When I try to do a command "| tstats count where index=XXX* by sourcetype", the command only find about 59 millions of events. Even increasing the time window with a "latest=+4824d" to reach the maximum supported by the software doesn't yield more events.

I thought about frozen data, so I increased the time window before freezing events just for debugging, deleted all my data, reindexed them all, but to no avail.

Is it possible for a event to be indexed without a sourcetype ? Or is there technological wizardry i'm not aware about ?

PickleRick · ‎11-04-2024

There is another possible explanation. Someone was trigger-happy with the delete command.

Deleted events are physically still in the index files so eventcount sees them but are marked as not searchable so tstats (and other search commands) don't use them.

Noctisae · ‎11-04-2024

Oh, that could explain it. I'll try to erase all events, clean the data partition of the instance entirely and restart clean, to see if the behavior is the same.

Thanks for your help !

PickleRick · ‎11-04-2024

My tests yesterday seemed to confirm it.

I have a test index.

I run

| eventcount index=test2 
| eval type="eventcount"
| append 
[ | tstats count where earliest=1 latest=+10y index=test2 
| eval type="tstats"]

And get

count	type
35172	eventcount
31077	tstats

(Yesterday I already removed some events)

So I run

index=test2 earliest=-2y@y latest=@y 
| delete

Splunk says it deleted 27549 events.

So I rerun my counting search and this time I get

count	type
35172	eventcount
3528	tstats

So you can see - deleting events changes tstats, doesn't touch eventcount

Noctisae · ‎11-05-2024

Sadly, that doesn't seem to be the cause of my problem.

I cleaned all indexes, deleted the data partition entirely, recreated the instance from scratch. After that, I checked and eventcount (as well as tstats) return 0 events for my indexes as expected.

When I move my files into the input folder monitored by Splunk, the count start to go up, but the two count start to diverge over time, and after one hour of ingestion, I stumble back to a huge difference : 142 millions seen by eventcount for 68 millions seen by tstats. I'm the only user of this test instance, so no deletion was made.

I checked the monitoring console index details of the instance for a particular index, and the numbers shown here are coherent with the numbers returned by eventcount for that index.

There seems to be an incoherence between my input files and the events retrieved by tstats. For a specific sourcetype, I have 54 inputs files for a total of 68 millions events (files full of JSON events, with nothing special in it, no specific line breaking or anything).

If I index only those files, I see in the logs of the server that the TailReader did saw the 54 files and counted 68 millions events. If I do a "| eventcount index=XXX", it returns 68 millions events. But if I do the search "| tstats count where index=XXX by source", I only have 35 source files for 28 millions of events.

When checking the logs of the instance, there's no log of error indicating anything wrong with my sourcetype or indicating reading or parsing problems. Only error messages from a service "STMgr" which indicate that for some buckets, there's an unexpected return code of -9. Would that mean that the indexing is taking too much memory, and if those process are killed by the OS, could that explain the incoherent numbers ?

PickleRick · ‎11-05-2024

And if you do

| tstats count where index=<your_index> earliest=1 latest=+10y

Anyway, that might call for support case.

Noctisae · ‎11-06-2024

Hello,

I tried the command, but same results, always 68 millions of events.

I'll try to contact support, thanks for your help !

PickleRick · ‎11-04-2024

There are many non-native speakers (including myself) here so don't wory. As long as you're making an effort to be at least somewhat understandable it's great! 🙂

Every event has several fields or "metafields" (like index - it's technically not a field indexed with an event, it's a "selector" but it's treated like a field when you're processing results). And each event has the holy trinity of source, sourcetype and host.

I have another suspicion - you have an indexer cluster, right?

Quoting https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eventcount

Running in clustered environments

Do not use the eventcount command to count events for comparison in indexer clustered environments. When a search runs, the eventcount command checks all buckets, including replicated and primary buckets, across all indexers in a cluster. As a result, the search may return inaccurate event counts.

Noctisae · ‎11-04-2024

I have another suspicion - you have an indexer cluster, right?

I forgot to mention it ! I'm currently running a standalone instance, not connected to anything else. I checked just in case, but the monitoring console of the instance does see the 160 million events, on the local instance, without replication. I also checked the inputs, and it is consistent with the returned number.

What's more confusing is that the events seems to be "seen" by some commands, but not others. For example, I tried to directly search "index=XXX host=YYY sourcetype=ZZZ" (so every field used should be indexed and retrievable even without search time extractions, and should not conflict with anything), and that search returns 2300 events over multiples hosts. If I pipe a "| stats count by host" behind it, the search returns 0, and doesn't see any events.

I don't know why, but there seems to be a part of my events I cannot aggregate against. That would explain the inconsistency, but as for the root cause, I'm at a loss of words.

Eventcount retrieving different numbers of events from tstats

tstats

Running in clustered environments

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?