Solved: Re: websphere logs indexing more then they should

Chris_R_ · ‎07-02-2010

A websphere server, in particular the websphere_trlog appear to be getting over indexed by a huge amount

Checking http://server:port/en-US/app/search/indexing_volume shows 30gb worth of data on a single /ntfs/kahobtwas39Jlog/PROD/XAG_3_1/SystemOut.log Looking at the log size in the dir has 30MB worth of logs, Splunk appears to have collected more then 30GB worth

Logs are rotating based on time, but they should still not be anywhere near 30gigs and the rotated logs are not whitelisted inputs.conf - settings

[monitor:///ntfs/kahobtwas38Jlog]
disabled = false  
crcSalt = <SOURCE>  
host = kahobtwas38.kah.unitrininc.com  
sourcetype = websphere_trlog_sysout  
whitelist = SystemOut\.log|SystemErr\.log

They are getting a lot of DateParserVerbose errors so it's possible events are getting over indexed by failing date extraction?

07-01-2010 15:50:03.880 WARN  DateParserVerbose - Time parsed (Sat Dec  1 14:50:15 2007) is too far away from the previous event's time (Thu Jul  1 15:50:15 2010) to be accepted.  If this is a correct time, MAX_DIFF_SECS_AGO (3600) or MAX_DIFF_SECS_HENCE (604800) may be overly restrictive.  Context="source::/ntfs/kahobtwas39Jlog/PROD/XAG_3_1/SystemOut.log|host::kahobtwas39.kah.unitrininc.com|websphere_trlog_sysout|"

Perhaps just turning off date extraction would help resolve this ala / or any other ideas?

/opt/splunk/etc/system/local/props.conf   
[websphere_trlog_sysout]  
DATETIME_CONFIG = CURRENT

I've tried monitoring with file inputs set to DEBUG but not seeing anything useful

Stephen_Sorkin · ‎08-24-2010

The first thing to look for in a case like this is duplicate events. If there are no duplicate events, where is the volume coming from? If there are, take a look at when these events are indexed by looking at _indextime to see when the data was indexed.

As an aside, why is the crcSalt set? Also, setting DATETIME_CONFIG here is a bad idea, the root problem is that event breaking isn't working properly and we need better configurations there.

View solution in original post

Stephen_Sorkin · ‎08-24-2010

The first thing to look for in a case like this is duplicate events. If there are no duplicate events, where is the volume coming from? If there are, take a look at when these events are indexed by looking at _indextime to see when the data was indexed.

As an aside, why is the crcSalt set? Also, setting DATETIME_CONFIG here is a bad idea, the root problem is that event breaking isn't working properly and we need better configurations there.

Chris_R_ · ‎08-24-2010

The crcSalt was set because the websphere logs all have a really big header which is identical in all the rotated logs, and splunk wouldnt index in the next SystemOut.log when it rotated.

Ill check on the duplicate events w/_indextime value, Thanks

Chris_R_ · ‎07-16-2010

sorry for the delay i was trying to recommend client using websphere app. Turns out it wont work for them.

The indexer is 4.1.3, Its monitoring network shares cifs/ntfs mounts such as:

[monitor:///ntfs/kahobtwas39Jlog]  
disabled = false  
crcSalt = < SOURCE >   
host = kahobtwas39.kah.unitrininc.com  
sourcetype = websphere_trlog_sysout   
_whitelist = (SystemOut\.log$|SystemErr\.log$)   
blacklist = (SystemOut_\d+.*|SystemErr_\d+.)

I tried adding those whitelist/blacklist entries to filter out the rotated logs. Still the same behavior

gkanapathy · ‎07-03-2010

Please indicate Splunk version of forwarder and indexer, if applicable, as well as type of indexer. Also indicate if there is a disparity between metrics logging and license volume.

websphere logs indexing more then they should

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

Everything Community at .conf24!

Index This | I’m short for "configuration file.” What am I?