I've got a log file we're monitoring which outputs it's events in a strange format I'm struggling to index correctly.
An example of the events are:
BSE:16/02/16 13:55:47 Thread:007528 Completed conversion of PDU to XML.
BSE:16/02/16 13:55:47 Thread:007528 Sending response message( 531 bytes, payload = 410 ) to client...
BSE:16/02/16 13:55:47 Thread:007528 Successfully sent response message
BSE:16/02/16 13:55:47 Thread:007528
BSE:16/02/16 13:55:47 Thread:007528 DAML Response Message
BSE:16/02/16 13:55:47 Thread:007528 -----------------------
BSE:16/02/16 13:55:47 Thread:007528 001. HTTP/1.1 200 OK
BSE:16/02/16 13:55:47 Thread:007528 002. Host: undefined
BSE:16/02/16 13:55:47 Thread:007528 003. Content-type: text/xml; charset=utf-8
BSE:16/02/16 13:55:47 Thread:007528 004. Cache-Control: no-store
BSE:16/02/16 13:55:47 Thread:007528 005. content-length: 410
(It actually continues for several lines after this)
The time stamp plus thread indicate which event it is, and everything to the right of it is actual event text.
Ideally I'd like to index it so that either everything to the left of the event text is stripped out and the events combined by the thread number (as the timestamps and the 3 letter code in front of it may be different depending on the event and could change during the event, however thread does not change format and is unique).
I've not been write a source-type stanza in my props.conf that's able to successfully combine the entire text of the events even before anything special such as stripping out the line header.
Has anyone encountered events like the example I have & have managed to index them properly as I've described? I haven't looked at using the transforms.conf as I'm not sure how I would apply it to this scenario but is that something I should be taking a deeper look at as well?
What type of reporting you want to do with these logs? Do you want to generates stats OR just want to look at all the events for a thread as one consolidated event?
I'm looking at viewing all the events for a thread as one, since technically each thread will count as "one event" (would really be much to work with stats wise or looking forward reporting or dashboarding wise).
I'm interesting in tracking the events (threads) over time and being able to have reports and dashboards which describe the general activity of the application this log is reporting on.
A big events like which comprise of lot of smaller events/sub-events, IMO, is a transaction (think of an log showing money transfer from one account to another, it'll have same transactionId but there will different logs events from different sub-systems). Again, based on type of reporting you want to do, it might be better to have this events separated out in Splunk. Reasons
1) Loading a very big events in Splunk web will cause your browser to slowdown or hang.
2) Splitting the events will let you calculate the duration of the event easily.
3) (Should've mentioned it as first) You'll get better indexing performance with smaller events than big events.
I get what you're saying with that, however in this specific scenario these are actual multi line single events which are recorded with a line header each new line, from my example in the question above the event itself should actually look like (starting with info from the bottom of the event):
All as one event. Even with the transaction command this doesn't give us quite the functionality and tracking we're looking for out of this log data.
The interesting problem here is getting the lines to group properly when they have these line headers, and from reading the Splunk documentation there doesn't appear to be an obvious way to do this from within the Splunk sourcetype options itself.
You might be able to club those multiple lines into one if they follow some specific pattern, like after thread, if there is a word (alphabet) then it's a separate events, but if its something else digits or hyphen, then it's part of previous events. If something like that applies in your case, try this for props.conf (check the time format)
[yoursourcetype] SHOULD_LINEMERGE=false LINE_BREAKER=([\r\n]+)\S+\s\S+\s\S+\s[a-zA-Z]+ TIME_PREFIX=^[^:]+: TIME_FORMAT = %y/%m/%d %H:%M:%S
When Splunk is indexing the data, you can't create a multiline event based on the a field like the thread number. At search time, you can combine events using the transaction command, or summarize them based on the thread number in many different ways.
However, if there is a particular pattern of events, you could use that pattern to create a multi-line event. Following is a very simple example:
BSE:16/02/16 13:55:47 Thread:007528 Start BSE:16/02/16 13:55:47 Thread:007528 Other stuff BSE:16/02/16 13:55:47 Thread:007528 more stuff
If the "Start" always appears at the beginning of the events for a thread number, you could "break" using a pattern. However, this will not work if thread events can be interleaved in the file. All the events for thread 007528 must appear in sequence, followed by all the events for the next thread number, etc.
You are probably better off indexing this file as single-line events, and combining the thread data at search time.
Thanks for the response! I've tried to see if there was some kind of pattern we could use, however the event data does seem to switch to a new event without some kind of indication in the text. For now I'll be going with the transaction style event text from above; but with an eye toward using a preprocessing script in the future to make the events cleaner to index in Splunk.
Thanks for your responses on this, I've ended up going with the transaction command to combine these events due to the odd formatting. This may be something we use a pre-processing script in the future to re-format into something more acceptable with Splunk standards.