Hi, I have an issue where I can see something is consuming licenses ingestion for a specific sourcetype. Unfortunately, the host is blank in index=_internal source="*license_usage.log*, however, I do know the index. I cannot find what host is sending data Indexed today by potentially sending dates in the past. I have found sending events dates in the past to be this issues. Only time i have solved it before, is DEBUG an HEC and I don't want to keep doing that.
I want to do something like this:
| tstats count where index=os AND sourcetype=ps groupby host| where < actually ingest time was yesterday but event time was days in the past, list event time and ingest time>
Is this possible? Thank you!
Chris
Update. So I had to put HEC into DEBUG mode to find my issue
body_chunk="{"time":1630910039.599,"index":"os","host":"myhost","source":"ps","sourcetype":"ps",
Just a snippet of the event above. The "time" sent in, is for the past. The HEC received time is Sept 6-time, but the actual time is Sept-24-time.
This is the problem I'm trying to find without having to place an indexer into DEBUG mode. To clarify the question, how would I find this problem in the future by SPL?
Thanks
Chris
And are you able to find those events in indexes? What's their _time and _indextime?
I'm stumped. So using this debug data , I did a search specifc to the time. Time picker was selected for Sept 6 (same day)
index=os sourcetype=ps host=MyHost _time=1630910039.599 | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S")
| table host _time indextime
Returns, the event time and indexed time differences, see below.
Yeah, so my hunch was right. Now I want to reverse engineer some spl i can run now and find some issues. So lets just start with using a time modifier for index_earliest and spot check before I do some time math.
index=os sourcetype=ps _index_earliest=-24h
| eval indextime=strftime(_indextime,"%Y-%m-%d")
| eval event_time =strftime(_time, "%Y-%m-%d")
| table host _time indextime event_time
The above does not pull back any data. If I remove the Timemodifier in the SPL and set the time picker for 24hrs, I do get back data.
Am I missing something?
Quoting the docs:
When using index-time based modifiers such as _index_earliest and _index_latest, your search must also have an event-time window which will retrieve the events. In other words, chunks of events might be ruled out based on the non index-time window as well as the index-time window. To be certain of retrieving every event based on index-time, you must run your search using All Time.
Ugh. Seem counter intuitive. Thought time modifiers in SPL overrode the time picker. Anyway, thanks, so I can just run some SPL for last 30-60 days in hopes I find my problem in the future. If someone is pushing events in the past like months and years in the past, it will be a heavy query...
It's not that counter-intuitive if you come to think of it. _time is the primary way of limiting buckets that splunk searches. _indexedtime is just a field there. So effectively, limiting index time is just like adding additional conditions on a field.
Well, unfortunately, host is not a very reliable field on its own. Depends on how you're getting your data and what and how is being parsed from the events.
But to check who is sending "late" data (remember that it might be indeed sent with a great delay or you might simply have highly misconfigured time on the source system or badly set timezone) you can do something like
<<search across your indexes>> | eval delay=_indextime-_time | stats avg(delay) min(delay) max(delay) by source index host
If you have consistent low values, your ingestion process is going smoothly. If you have several minutes delay, you have some bottlenecks (can be a normal state though in case of some ingestion forms - like forwarding data via WEF and reading them from Forwarded Items by UF). If you have negative values - you have problems with time sync. And so on.
Thanks for the reply. That query did not tell me anything was wrong, all looked fine there.
Your best option is to enable Forwarder Monitoring on the Distributed Monitoring Console (DMC) or MC on the index master (if you don't have a DMC). That feature provides all types of detailed information on what forwarders are connected, status, data thruput, and a lot more.
See the following documentation for more:
https://docs.splunk.com/Documentation/Splunk/8.2.2/Updating/Forwardermanagementoverview
The DMC (or MC on master) also provides license utilization information at:
Monitoring Console > Indexing > License Usage - Today or Historic License Usage (those are the two 'canned' options).
Worth noting if you have or try either option, you can hover over the graphs and click on "open in search" to see the search(es) that power the panels. Those can also help give you a good base for building upon/modifying to suit your specific needs.
Thanks. Yeah looked at that, even have MetaWoot gathering metrics. None of those will show me ingest if the event date is in the past. Next stop debugging.
One option is install e.g. Meta Woot and use it to figure out which source sends those logs.
https://splunkbase.splunk.com/app/2949/
r. Ismo