topic Re: Breaking unusual log files in Splunk Search

Breaking unusual log files

Josh — Mon, 26 Apr 2010 17:55:45 GMT

Hi All,

I am having trouble breaking up the log file below:

Each log entry starts with id:#################### and ends with id:@@@@@@@@@@@@@@@@

However as you can see below the logs can be written to by to different processes at the same time, is there any way to have the logs broken as follows:

Start an event with id:#################### and end only when you see the corresponding id ending e.g id:@@@@@@@@@@@@@@@@

Also the log entries also have a start and a complete date, as Splunk by default breaks by date if we swtich this off is there a way to force Splunk to just take the start date as the event date?

The start date proceeds the line id:####################

8656: ################################################
8656: From xxxxxx@xxxxxxxxx 31 Mar 2010 11:33:24 (pid=2768,TSRW.52.2,ditTradeBookManager)
8656: IN ### PagefileUsage: 46,508 K, PeakPagefileUsage: 52,756 K
8656: Fetch Trades For FIRM
8656: This user is GLOBAL hence executing the current Tradeserver Functionality 
8656:  with condition [BOOK in ('JXH1', 'JXF1') and FIRMfutures.EXPIRY_MONTH >= '01-Mar-2010' and CONTRACT in ('USFF/CBOT', 'AUIBCR/SFE') and FIRMfutures.STATUS = 'WHATIF']
8656: TStradeDB::retrieveTrades returned 0 trades
10152: 
10152: ################################################
8656: OUT ### PagefileUsage: 46,508 K, diff : 0 Bytes
10152: From xxxxxx@vxxxxxxxx 31 Mar 2010 11:33:24 (pid=3636,TSRW.52.2,ditTradeBookManager)
8656: OUT ### PeakPagefileUsage: 52,756 K, diff : 0 Bytes
10152: IN ### PagefileUsage: 46,508 K, PeakPagefileUsage: 52,756 K
10152: Fetch Trades For JSW
8656: Completed 31 Mar 2010 11:33:24 (0.031000s elapsed)
8656: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
10152: This user is GLOBAL hence executing the current Tradeserver Functionality 
10152:  with condition [BOOK in ('JXH1', 'JXF1') and TDtrade.EXPIRY >= '01-Mar-2010' and TDtrade.STATUS = 'WHATIF' and TDtrade.VERSION_DATE = '01-Mar-2010' and CONTRACT in ('NZBL/NZFE', 'AUBL/SFE') and FIRMfutures.STATUS = 'WHATIF']
8656: TStradeDB::retrieveTrades returned 0 trades
8656: OUT ### PagefileUsage: 46,508 K, diff : -68 K
8656: OUT ### PeakPagefileUsage: 52,756 K, diff : 0 Bytes
8656: Completed 31 Mar 2010 11:33:24 (0.032000s elapsed)
8656: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

So from the above I would get the following extracts:

8656: ################################################
8656: From xxxxxx@xxxxxxxx31 Mar 2010 11:33:24 (pid=2768,TSRW.52.2,ditTradeBookManager)
8656: IN ### PagefileUsage: 46,508 K, PeakPagefileUsage: 52,756 K
8656: Fetch Trades For FIRM
8656: This user is GLOBAL hence executing the current Tradeserver Functionality 
8656:  with condition [BOOK in ('JXH1', 'JXF1') and FIRMfutures.EXPIRY_MONTH >= '01-Mar-2010' and CONTRACT in ('USFF/CBOT', 'AUIBCR/SFE') and FIRMfutures.STATUS = 'WHATIF']
8656: TStradeDB::retrieveTrades returned 0 trades
8656: OUT ### PagefileUsage: 46,508 K, diff : 0 Bytes
8656: OUT ### PeakPagefileUsage: 52,756 K, diff : 0 Bytes
8656: Completed 31 Mar 2010 11:33:24 (0.031000s elapsed)
8656: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

AND

10152: ################################################
10152: From xxxxxxx@xxxxxxxx 31 Mar 2010 11:33:24 (pid=3636,TSRW.52.2,ditTradeBookManager)
10152: IN ### PagefileUsage: 46,508 K, PeakPagefileUsage: 52,756 K
10152: Fetch Trades For JSW
10152: This user is GLOBAL hence executing the current Tradeserver Functionality 
10152: TStradeDB::retrieveTrades returned 0 trades
10152: OUT ### PagefileUsage: 46,508 K, diff : 0 Bytes
10152: OUT ### PeakPagefileUsage: 52,756 K, diff : 0 Bytes
10152: Completed 31 Mar 2010 11:33:24 (0.000000s elapsed)
10152: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

AND

8656: ################################################
8656: From xxxxxxx@xxxxxxxxx 31 Mar 2010 11:33:24 (pid=2768,TSRW.52.2,ditTradeBookManager)
8656: IN ### PagefileUsage: 46,576 K, PeakPagefileUsage: 52,756 K
8656: Fetch Trades For FIRM
8656: This user is GLOBAL hence executing the current Tradeserver Functionality 
8656:  with condition [BOOK in ('JXH1', 'JXF1') and FIRMfutures.EXPIRY_MONTH >= '01-Mar-2010' and CONTRACT in ('NZBL/NZFE', 'AUBL/SFE') and FIRMfutures.STATUS = 'WHATIF']
8656: TStradeDB::retrieveTrades returned 0 trades
8656: OUT ### PagefileUsage: 46,508 K, diff : -68 K
8656: OUT ### PeakPagefileUsage: 52,756 K, diff : 0 Bytes
8656: Completed 31 Mar 2010 11:33:24 (0.032000s elapsed)
8656: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Re: Breaking unusual log files

Mick — Mon, 26 Apr 2010 21:28:44 GMT

Ok, so my first answer was way off, so I'm changing it. Yes, that's allowed.

I missed the slightly important fact that we're dealing with interleaved events here, as in event1 starts, writes a couple of lines, event2 starts, writes a couple of lines, event1 writes some more, event2 writes some more, event 1 completes then event 2 completes.

Can Splunk re-order this so that events are separated and concatenated properly? No it cannot, Splunk reads log files as they are written, it cannot be configured to re-order data. However, it would be possible to write a scripted input to do this first, and then feed the data to Splunk.

Alternatively, you could just index the data as single-line events and then use the 'stats' or 'transaction' commands to sort the various lines together in search output.

The final option would be to change the way in which your processes write their log files. Instead of one line at a time, is it possible to write a whole event at once so they are not intermingled? Or even separate log-files would do the trick too.

Re: Breaking unusual log files

sideview — Mon, 26 Apr 2010 23:55:07 GMT

If you're stuck with this kinda cross-threaded indexing picture, you can still index everything as single line events and use the search language to put things back together at search time. This often seems like a consolation prize but in many ways it can actually be a better and more flexible way to go. For instance for a long running transaction you get the start events into the index and your system has the power to detect the problem before the transaction closes.

One concrete problem is that you end up with some weird things like half of 8656 is at its own starttime, and the other half is indexed at the starttime of 10152. Therefore for any particular timerange you may need to filter out transactions whose end was clipped off, or else your statistics might get skewed slightly.

So to be able to account for that Im going to assume that there's some endEvent=1 field that we extract when we see the final line. And Im also assuming that you are correctly extracting the 'eventId' field out of the single line events, and other fields involved are getting extracted properly.

So here are 2 ways to do it. For both examples lets say you're interested in the peakPageFileUsage and elapsedTime fields

1) the transaction command can put this back together with

sourcetype="foo" | transaction eventId maxspan=1m | search endEvent=1 | fields eventId peakPageFileUsage elapsedTime

(the maxspan value you would use depends on how frequently the eventId's are reused by the system. )

2) And stats is often overlooked but it can do a lot of this stuff just fine too.

sourcetype="foo" | stats first(endEvent) first(peakPageFileUsage) first(elapsedTime) by eventId | search endEvent=1

(and if you ever have a field that appears multiple times in a single eventId you could use the values() operator within stats and then operate on a multi-valued field from there. )

Re: Breaking unusual log files

gkanapathy — Tue, 27 Apr 2010 00:05:59 GMT

What they said. Basically, you either use transaction after the fact, or you fix how you log by writing log entries atomically.

There is a rather exotic possibility also of using transaction and then writing the results to a summary index, but this requires a little extra work and customization.

Re: Breaking unusual log files

Lowell — Tue, 27 Apr 2010 00:22:51 GMT

I have one alternate suggestion for example 1 that is less time-range sensitive. Assuming your start and end markers are as clear-cut as they were stated, then you may want to try the transaction command: transaction eventId startswith="################################################" endswith="@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@" Then you don't need the | search endEvent=1 part.

Re: Breaking unusual log files

Lowell — Tue, 27 Apr 2010 03:21:43 GMT

If you haven't done so already, I would suggest explicitly add a TIME_FORMAT in your props.conf file. (Maybe even TIME_PREFIX too.) You don't want splunk guessing timestamps formats and trying to turn your eventId field into a date/time. You also have dates following the term EXPIRY_MONTH that could be picked up as well, so you probably want to nail this down and keep splunk from guessing.