Hi,
I'm a Splunk newbie and I'm trying to write some queries for our logs using 'transaction'.
Our logs have multiple events for the same timestamp as follows (I have simplified the logs, removing the unrelated fields w.r.to this query):
Timestamp : (thread_name) : message
2010-05-21 09:25:02 : (2702) : Completed calling function init
2010-05-21 09:25:02 : (2702) : Calling function init
2010-05-21 09:25:02 : (4386) : Completed calling function fetchWorkflowRunBreaks
2010-05-21 09:25:02 : (3624) : Completed calling function fetchWorkflowRunBreaks
2010-05-21 09:25:02 : (4386) : Calling function fetchWorkflowRunBreaks
2010-05-21 09:25:02 : (3624) : Calling function fetchWorkflowRunBreaks
2010-05-21 09:25:02 : (5677) : Completed calling function fetchWorkflowRunBreaks
2010-05-21 09:25:02 : (5677) : Calling function fetchWorkflowRunBreaks
2010-05-21 09:25:02 : (4127) : Completed calling function fetchWorkflowRunBreaks
2010-05-21 09:25:02 : (4127) : Calling function fetchWorkflowRunBreaks
2010-05-21 09:25:01 : (4644) : Completed calling function fetchWorkflowRunBreaks
2010-05-21 09:25:01 : (4644) : Calling function fetchWorkflowRunBreaks
2010-05-21 09:25:01 : (2314) : Completed calling function startTaskInstLog
2010-05-21 09:25:01 : (2314) : Calling function startTaskInstLog
2010-05-21 09:25:01 : (2314) : Completed calling function endTaskInstLog
2010-05-21 09:25:01 : (2314) : Calling function endTaskInstLog
2010-05-21 09:25:01 : (2314) : Completed calling function startTaskInstLog
2010-05-21 09:25:01 : (2314) : Calling function startTaskInstLog
2010-05-21 09:25:01 : (2314) : Completed calling function startWflowLog
2010-05-21 09:25:01 : (2314) : Calling function startWflowLog
2010-05-21 09:25:01 : (1800) : Completed calling function lockObject
2010-05-21 09:25:01 : (1800) : Calling function lockObject
2010-05-21 09:25:01 : (2314) : Completed calling function nextSeq
2010-05-21 09:25:01 : (2314) : Calling function nextSeq
2010-05-21 09:25:01 : (2314) : Completed calling function fetchWkflowVarPersistValues
2010-05-21 09:25:01 : (2314) : Calling function fetchWkflowVarPersistValues
2010-05-21 09:25:01 : (2314) : Completed calling function fetchTask
2010-05-21 09:25:01 : (2314) : Calling function fetchTask
2010-05-21 09:25:01 : (2314) : Completed calling function fetchTask
2010-05-21 09:25:01 : (2314) : Calling function fetchTask
I am trying to write a search query to identify the functions that run for more than 'x' seconds, using transaction command as follows (i.e. extract the 'function' and use thread_name and function as unique IDs)
source="..." | rex field=message "Calling function (?<rep_function>.[a-zA-Z]+)" | rex field=message "Completed calling function (?<rep_function>.[a-zA-Z]+)" | transaction thread_name,rep_function startsWith=(message="Calling function*") endsWith=(message="Completed calling function*") | search duration>100
I get unexpected results -- many transactions that complete in the same second are also resulted with a high duration quoting different start/end timestamps. (I see from verifying the log that the transaction ended in the same second)
I suspect Splunk is relying on the timestamp of the events for ordering (and not necessarily in the same order as appearing in the source file) and those events that are with the same timestamp might contribute to these unexpected results.
My questions:
Thanks, Krishna R
I'm using Splunk 4.1.2
The query looks fine, except that it wouldn't work if the data in the file is really in the order presented above.
The startswith
condition requires that the event have an earlier timestamp OR arrive first if they have the same timestamp. It cannot have a later timestamp.
You cannot really tell Splunk to use the sequence of events in the file. The timestamp is most precedent and for ties, the arrival order is used.
Hi Stephen,
Thanks for your response.
I would like to understand what exactly is the arrival order. (is it the the order of the lines in the file)
Our logs are mostly in reverse chronological order - among lines of the same timestamp, the last one in the file is the oldest one. Does splunk think the first line is the oldest one, so I should reverse the lines in the file and index in splunk?
(btw, the logs are not generated that way :D)
I dint know 'tac', found it in our linux boxes not in solaris or AIX machines. I think I will stick with the perl one-liner, nevertheless thanks for the tip!
May I also suggest the unix command tac
.
Thanks Stephen.
Reversing the file helped (perl -e "print reverse <>" logfile).
Arrival order means the order of lines in the file, or order that network events land at Splunk. Reversing the order would probably help.
The query looks fine, except that it wouldn't work if the data in the file is really in the order presented above.
The startswith
condition requires that the event have an earlier timestamp OR arrive first if they have the same timestamp. It cannot have a later timestamp.
You cannot really tell Splunk to use the sequence of events in the file. The timestamp is most precedent and for ties, the arrival order is used.
The events in our logs are in reverse chronological order - I reversed the lines in the log and the query worked as expected. Thanks for the help!
I am not sure why, but the search command I pasted is missing the field place holders in the rex command - pasting again ========== | rex field=message "Calling function (?