I am a splunk newbie, so some obvious explanations might need further clarification.
What I have:
Advanced medical imaging system of systems that produces a global output log of a specific format (example given later)
I apply a repetitive task to this system: Example: startup until all statuses are reported, issue shutdown and repeat, this will go on for days without operator intervention. (there are many other tests I do, but this is the one I am testing with the splunk concept)
What I am trying to do (big picture):
Index/chop up log files based on testing time period. [testing time period = time operator turns on the script to perform tests until the script is turned off]
Index/chop up log files based on cycle. [system startup to shutdown would be one cycle]
Index all output messages. [ I will get about 5 cycles per hour with 200-400 time stamped reported events per cycle, unless something unexpected occures]
Goal: find out which events are not supposed to happen and investigate to fix
Types of Outputs: categorize # of specific event_identifier that occur in each cycle to create a baseline/statistical prediction based on event_identifier and event_identifier content. Find errors that reflect a need to fix something.
I am not expecting someone to do my job for me, but more of being lead in the right direction. I am still learning the splunk data mining lingo.
What I am currently doing:
I am using the source log file for the cycle period
[this is what I can not figure out] For "cycle" I want the cycle to start every time the log outputs an event with a specific message until it sees that same message again.
Each event is divided based on example message below, event being from start message to end message
my (users\admin\search\local\props.conf) is as follows:
[Power Cycle]
"EXTRACT-event_source = (?im)^\t(?P<'event_source>[^\t]+)"
EXTRACT-event_identifier = (?im)^(?:[^\t\n]*\t){4}(?P<'event_identifier>[^\t]+)
EXTRACT-event_location = (?im)^\t\w+\t(?P<'event_location>.+)
EXTRACT-event_start_ID = (?im)^(?P<'event_start_ID>.+) '
my (\etc\system\local\props.conf)
[Power Cycle]
BREAK_ONLY_BEFORE = SR \d\d\d
MUST_BREAK_AFTER = EN \d\d\d
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = true
pulldown_type = 1
I am testing this on my own time and hope to eventually present it to my supervisor to try and implement it as a common tool within our engineering department, especially when trying to prove system reliability.
Example Message:
SR 145
1371027603 1 1 Wed Jun 12 09:00:03 2013 200002348 4
bay90ct cupMonitor
ssProcStop.c 1509
The System Software has terminated.
EN 145
SR ### (event_start_ID) (--start message
1371027603(unique ID for specific time) 1(ignore) 1(ignore) Wed Jun 12 09:00:03 2013(tstamp) 200002348(event_identifier)
bay90ct(event_source) cupMonitor(Process)
ssProcStop.c(event_location) 1509(line in source)
The System Software has terminated.(message, can be multi-lined)
SR ### (---end message
Each cycle will be differentite by an event message that begins the next cycle at that specific message. This is the first message logged when the system is first turned on.
SR 415
1372052120 0 1 Mon Jun 24 05:35:20 2013 0 7
bay92ct Svc_Notepad
Notepad.c 44
This message was added by the OPERATOR to report on a problem:
PRODUCT CONFIGURATION|-- insert unique product information here--
EN 415
Example cycle test period
SR 261
1370995620 0 1 Wed Jun 12 00:07:00 2013 0 7
bay90ct Svc_Notepad
Notepad.c 44
This message was added by the OPERATOR to report on a problem:
RstHast Enabled - start command: startrsthast -shutdown . Type stoprsthast in unix shell to disable
EN 261
/////PLACE A BUNCH OF Cycles with messages HERE
SR 179
1371027942 0 1 Wed Jun 12 09:05:42 2013 0 7
bay90ct Svc_Notepad
Notepad.c 44
This message was added by the OPERATOR to report on a problem:
Rsthast Disabled
EN 179
... View more