One of my dashboards reflects some data which actually isn't present in the data input. It might have been present before, but not now.
Maybe the output is being fetched from some cached memory?
How do I troubleshoot? Please help.
Example
Data input:
Hostname Mode Status
abcd1234 Patch Done
Search output:
Hostname Mode Status
abcd1234 Patch Done
wxyz5678 Patch Pending
I think you misunderstand how Splunk works. Data is ingested into Splunk, and it stays there. If you put in a query that will bring back old data, it will bring back old data, because that's what you asked for. Data is NEVER updated in splunk, it is only ingested. If you add new data that is similar to old data, but has been changed in some way, then both copies of the data are still there.
It sounds like you are expecting some other behavior. So, here are two possible suggestions:
First, if you are periodically ingesting a complete copy of your data, then you can (A) use the _indextime field as a limitation on the search. For instance, if your data was loaded daily, then add index_earliest=-24h
to your searches, and they will only bring back records that were ingested and indexed within the last 24 hours.
Second, if your records have keys, then you could use dedup
to keep only the latest copy of a given record.
Thanks DalJeanis for your answer.
I am already using dedup in my query. I believe it returns unique fields which are ingested latest.
I refresh complete data every 2 hrs. Using "_index_earliest=-2h" doesn't seem to work for me somehow!!
Can you provide your actual search query? And what exactly do you mean by "doesn't seem to work"? Do you get no results, or still results from previous imports?
source="file path" host="host" sourcetype="tsv" _index_earliest=-2h
| dedup Hostname | where Mode = "Patch"
No results found.
DalJeanis,
Does the data get ingested even if it is exactly same as the already ingested data?
No, if your file is identical (or actually: by default if the first 256 bytes are identical) to a previously ingested file splunk will skip the file.
Aah!! May be that's the reason why _index_earliest=-2h is showing no results.
Got the desired output once I expanded the range where I know there will definitely be a change.
Thanks Frank 🙂
Yep, that's probably why.
If you still want to filter the data for the most recent logs, take a look at this (as mentioned earlier): https://answers.splunk.com/answers/697546/how-do-you-search-events-that-contain-most-recent.html#ans...
Can you please let me know if there is any way to clear the already ingested data & start afresh.
Have you set a suitable time window on your search, or are you running it over all time?
How exactly are you ingesting this data into Splunk and what exactly do you mean by "some data with actually isn't present in the data input"?
Have you set a suitable time window on your search, or are you running it over all time? --- Over all time.
How exactly are you ingesting this data into Splunk and what exactly do you mean by "some data with actually isn't present in the data input"? --- I have a new tab separated file that gets generated every 2 hrs.
And you only want to see the results of the latest import? Then why do you search over all time?
Can you provide the actual search you run? Running it over a better time window (e.g. last 2 hours) such that only 1 import is taken into account might solve your problem. Otherwise there are also slightly more advanced ways to filter your search results for only entries from the most recent import. See for example this related question: https://answers.splunk.com/answers/697546/how-do-you-search-events-that-contain-most-recent.html#ans...
Can someone please help troubleshooting this.
what search query you are using?
source="" host="" sourcetype="tsv"
| dedup Hostname | where Mode = "Patch"