Knowledge Management

summary indexing blocked and binary file warning

yannK
Splunk Employee
Splunk Employee

I noticed that my summary indexing stopped working.
The summary results files are being generated in the spooler, but are not indexed.

my /opt/splunk/var/spool/splunk/ folder is full of file like *.stash_new that are filling my disk.
and in splunkd.log I found :


12-20-2012 20:23:09.780 +0000 WARN FileClassifierManager - The file '/opt/splunk/var/spool/splunk/aejhsdv_342014304.stash_new' is invalid. Reason: binary

Tags (3)
1 Solution

yannK
Splunk Employee
Splunk Employee

This is because of known bug in 4.3.* and 5.* (SPL-59578)
The issue is triggered when too many summary files are generated at the same time.
the long term solution is to change the schedule or wait for a fix.

One of the consequence of this bug is that the summary results file (the ones with the suffix stash_new) are incorrect and contains invalid characters interpreted as binary. (see the error in splunkd.log)

The good workaround is to reschedule the summary indexing and run a backfill to regenerate the corrupted files.

A bad workaround is to force splunk to index them anyway, (but the source will be incorrect and rsult swill be merged)
$SPLUNK_HOME/etc/system/local/props.conf
and add this to the summary sourcetype :
[stash_new]
NO_BINARY_CHECK=1

Once done check the spooler it should empty itself.

View solution in original post

WedbushITOps
Engager

I had this same problem. If there is another program actively writing to the file(s), then splunk seems to label it as binary.

0 Karma

yannK
Splunk Employee
Splunk Employee

This is because of known bug in 4.3.* and 5.* (SPL-59578)
The issue is triggered when too many summary files are generated at the same time.
the long term solution is to change the schedule or wait for a fix.

One of the consequence of this bug is that the summary results file (the ones with the suffix stash_new) are incorrect and contains invalid characters interpreted as binary. (see the error in splunkd.log)

The good workaround is to reschedule the summary indexing and run a backfill to regenerate the corrupted files.

A bad workaround is to force splunk to index them anyway, (but the source will be incorrect and rsult swill be merged)
$SPLUNK_HOME/etc/system/local/props.conf
and add this to the summary sourcetype :
[stash_new]
NO_BINARY_CHECK=1

Once done check the spooler it should empty itself.

View solution in original post

bnorthway
Path Finder

I had the same problem on 6.3.0. I deleted the stash_new files and re-ran the backfill commands, and now the data is showing up correctly. Thanks

0 Karma

ben_leung
Builder

"The issue is triggered when too many summary files are generated at the same time."

The issue was fixed for binary files, but what about the issue with too many summary files generating at the same time? In our deployment, shuffling the schedules for summary searches will not help as their are many...

SO wondering if the root issue has been resolved?

Version 6.0.5

0 Karma

yannK
Splunk Employee
Splunk Employee

Otherwise, once on 5.0.3, a safest approach is to delete the binary ones, and use the summary backfill method to regenerate them.
see http://docs.splunk.com/Documentation/Splunk/5.0.3/Knowledge/Managesummaryindexgapsandoverlaps

0 Karma

yannK
Splunk Employee
Splunk Employee

The fix in version 5.0.3 will not index the incorrect files, it will prevent new ones to be created.
you can still use the NO_BINARY_CHECK trick to index them once.

yoho
Contributor

I had this problem too and have upgraded to 5.0.3 today. However these files are still there - can I stop splunk, remove them, start splunk ?

0 Karma

pvols1979
Explorer

I think we are actually experiencing this issue now with the Splunk for Palo Alto Networks app. There are some searches that run every 5 minutes (around 6 of them). We forward our summary data back off to the indexers instead of keeping it on the search head. However, in the case of this one app, the stash_new files are being created and continue to build until the partition fills.

jrodman
Splunk Employee
Splunk Employee

Note that NO_BINARY_CHECK will not eliminate the problem that the data may not be complete, and may not be labelled correctly in terms of the 'source' field.

The only short-term workaround is to reduce the time overlap in searches run at the same time, on the same search head, with similar names.

There's a correction for this defect that is currently being planned to release as 5.0.3.

Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.