I've recently created a saved search to store items into a summary index. It's scheduled to run every 5 minutes and searches with the parameters:
Start time: -5m@m
Finish time: now
Thus, the results produced should be recording a chunk of events from a source index every 5 minutes and adding them to the summary index. Also, just as an FYI, the source index just happens to only have 1 entry every 5 minutes. So I'm expecting this one entry is found every 5 minutes and put into the summary index.
However, instead, what I'm seeing is that my entries in the summary index are doubled! Instead of that 1 entry every 5 minutes I'm expecting, what I find instead is 2 entries every 5 minutes, which is the correct entry... just in there twice.
I've even set the saved search to execute every 15 minutes instead, searching the source index with the time range of "-15m@m" to "now". When I do this, the results are still doubled. This ruled out any chances of an entry being doubled by being found on the edges of both time ranges.
Also, someone else on my team has run a similar set up with a saved search running once every hour grabbing events from a source index. Within that hour, there are many, many, many entries to be found. But in the summary index, we find that every entry is in there exactly twice again.
Has anyone else experienced this problem? Am I setting this up incorrectly? Thanks!
Has anyone else seen this behavior? I'm having the same problem. Single search running every five minutes. I get double the number of entries in the summary_index.
the xxx SPLUNK xxx
header in the file is actually metadata you can put into any file. See http://docs.splunk.com/Documentation/Splunk/latest/Data/Assignmetadatatoeventsdynamically#Configure_...
/kristian
I'm having the same problem; no solution yet.
Obviously, this is not a solution we want to go with, as modifying those files is definitely NOT the right solution. But I figured it'd be interesting to note that we have come to a working solution by doing this and lets you move forward with any other forms of testing you wanted to do.
...were modified to look like this:
#[batch://$SPLUNK_HOME/var/spool/splunk]
#move_policy = sinkhole
#crcSalt =
#[batch://$SPLUNK_HOME/var/spool/splunk/...stash_new]
[monitor://$SPLUNK_HOME/var/spool/splunk/...stash_new]
queue = stashparsing
sourcetype = stash_new
#move_policy = sinkhole
crcSalt =
After that change, everything has actually started WORKING properly! Those weird header lines that start with ---SPLUNK--- ... are no longer there, and events are only displaying once instead of being doubled. In other words, everything looks completely accurate.
I don't recommend this, but we have found an iffy workaround. Perform at own risk. ^_^
My teammate experimented a bit and modified the inputs.conf in the etc/system/default folder (the path you are never supposed to alter). Interestingly, she tried switching the monitoring of the stash files from "batch" to "monitor". So the sections that read:
[batch://$SPLUNK_HOME/var/spool/splunk]
move_policy = sinkhole
crcSalt =
[batch://$SPLUNK_HOME/var/spool/splunk/...stash_new]
queue = stashparsing
sourcetype = stash_new
move_policy = sinkhole
crcSalt =
(continued in next post)
Also, in the entry above, the random entry is supposed to have three asterisks (*) before and after the word SPLUNK, but when I do that here, it bold faces and italicizes the word. ^_^ So just pretend those dashes are asterisks.
Latest development:
My teammate checked the dispatch folder to find the actual results, and in the results.csv.gz files, the results are actually not duplicated! Each result is found only once.
Also, we see these random entries in the summary index:
---SPLUNK--- index="summary-data" source="SEARCH-NAME"
The reason this sticks out to us is that, when we ran these searches on our older 4.2.5 Search Heads, these types of events were nowhere to be found.
So for some reason, Splunk is displaying the entries twice. So this appears to be a viewing problem, not an indexing one.
An interesting development: so currently, I have 4 Search Heads in my environment. 2 of them are older, running on Splunk 4.2.5, and two of them are new, running on 4.3.3.
I set up the same saved searches and a local index on a 4.2.5 Search Head machine, and I'm NOT getting any duplicate events. However, my teammate saw the duplicated entries on one of the 4.3.3 machines and I saw the duplicated entries on the OTHER 4.3.3 machine. So either there's a bug in 4.3.3 or I did something wrong when I installed Splunk on the two new machines and have a setting set incorrectly.
James
Actually, for organization, I'm putting the search info here instead:
Adding my saved search here.
index=fooindex sourcetype=somesourcetype FIELD1="Value1" FIELD2="Value2" | stats avg(FIELD3) as FIELD4 by _time, FIELD1
I then have it scheduled to run every 5 minutes and Summary Indexing is enabled and I've selected a summary index I'll call "summary-data".
Then, when I search to see my results, all I do is run the search "index=summary-data" and see what pops up. And this is where I see each of the results duplicated.
Hope that helps a little.
James
I've added that information to the original post. As for the interval part, I'll definitely switch it to "@m" from now on. That's good advice, thanks. I'll let you know if it causes any changes to the results.
What are the searches that you run to collect data in the summary index? And what the one used to check for double events?
On a sidenote, it would be safer to set the interval of the saved searches to: from: -5m@m to:@m