Splunk Search

Splunk query to retrieve full stack traces

habiba29
New Member
HI,
me. Am trying to do analysis of stacktraces in splunk for our RDMS. Essentially we can extract the spid for each stacktrace with the below, however we need to extract the full stack dump for the spid also around the same time frame. We have tried multiple avenues however it looks like the limiter is the line breaks in splunk which limited the ability to look back to retrieve the full trace. The only way to identify a stack trace is "end of stack trace". The events can be corelated by the spid which is identified on each line between :0000:<SPID>:Date stamp. It is worth noting that the stack dumps have no consistent header, the only way to correlate is by the spid number.
 
Is there a way of doing a sub query search to look back in the same time frame when a "end of stack trace event" happens and look back on the same time frame for the spid to get back the full stack dump ? Example of trace below.
 
Query to retrieve the spids:

index=<myindex> source="<loglocation>/errorlog_*" AND NOT source="<loglocation>/*RSM*" AND NOT source="<loglocation>/*DEV*" AND NOT source="<loglocation>/*_BACKUP" AND "end of stack trace" | rex "0000:+(?<spid>[^:]+):" | chart count by spid

Stack Dump:
00:0025:00000:06378:2020/12/01 14:07:28.26 server Buffer Information:
Buf addr = MASKEd Mass addr = MASKED
Buf pageno = MASKED, Mass pageno = MASKED, dbid = MASKED
00:0025:00000:06378:2020/12/01 14:07:28.26 server
Buf virtpg = MASKED, Mass virtpg = MASKED
Buf stat = MASKED, Mass stat = MASKED
Mass keep = 1, Mass awaited = 0

00:0025:00000:06378:2020/12/01 14:07:28.26 server Page Information from first read attempt: Page read from disk ppageno = MASKEDpptnid = MASKEDpindid = 2 pnextpg = 178748, pprevpg = 178746 plevel = 0, pstat = 0x82 pts_hi = 0, pts_lo = MASKED
00:0025:00000:06378:2020/12/01 14:07:28.26 server Page Information from second read attempt: Page read from disk ppageno = MASKED, SKED8747, pptnid = MASKED pindid = 2 pnextpg = 178748, pprevpg = 178746 plevel = 0, pstat = 0x82 pts_hi = 0, pts_lo = MASKED
00:0025:00000:06378:2020/12/01 14:07:28.26 server SDES Information:
dbid = MASKED, objid = MASKED , sptnid = MASKED scur.pageid = MASKED
sstat = MASKED, sstat2 = MASKED
suid = MASKED, cacheid = MASKED
00:0025:00000:06378:2020/12/01 14:07:28.26 server SDES Information: scur = MASKED, 0) physical RID
sstat3 = MASKED, sstat4 = MASKED, sstat5 = 0x0, sstat6 = 0x0, sstat7 = 0x0
00:0025:00000:06378:2020/12/01 14:07:28.26 server DES Information: name = MASKED type = 'U '
objuserstat = MASKED, objsysstat = MASKED, objsysstat2 = MASKED, objsysstat3 = 0x0, objsysstat4 = 0x0, objsysstat5 = 0x0
00:0025:00000:06378:2020/12/01 14:07:28.26 server DES properties: ( MASKED, MASKED)
00:0025:00000:06378:2020/12/01 14:07:28.26 server PSS Information:
pstat = MASKED, pcurdb = MASKED, pspid = 6378
p2stat = MASKED, p3stat = MASKED,
plasterror = 0, preverror = MASKED, pattention = 0
00:0025:00000:06378:2020/12/01 14:07:28.26 server PSS Information:
p4stat = MASKED, p5stat = MASKED, p6stat = MASKED, p7stat = MASKED, p8stat = MASKED,
pcurcmd = MASKED('FETCH CURSOR'), pcmderrs = 0x0
00:0025:00000:06378:2020/12/01 14:07:28.26 server End diagnostics for read failure:
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED pcstkwalk+0x482()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED ucstkgentrace+0x20f()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED ucbacktrace+0x54()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED wrongpage__print_diagnostic+0x7d0()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED wrongpage+0xb88()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED getpage_with_validation+0x2175()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED apl_getnext+0x8e0()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED getnext+0x228()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED LeScanOp::_LeOpNext(ExeCtxt&)+0x106()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED LeEmitSndOp::_LeOpNext(ExeCtxt&)+0x1be()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED LePlanNext+0xea()
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel [Handler pc: MASKED le_execerr installed by the following function:-]
00:0025:00000:06378:2020/12/01 14:07:28.26 kernel pc: MASKED exec_lava+0x688()
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel pc: MASKED curs_fetch+0xf2()
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel pc: MASKED s_execute+0x32f2()
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel [Handler pc: MASKED hdl_stack installed by the following function:-]
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel [Handler pc: MASKED s_handle installed by the following function:-]
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel pc: MASKED sequencer+0xcb1()
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel pc: MASKED tdsrecv_language+0x1df()
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel [Handler pc: MASKED ut_handle installed by the following function:-]
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel pc: MASKED conn_hdlr MASKED ()
00:0025:00000:06378:2020/12/01 14:07:28.27 kernel end of stack trace, spid 6378, kpid MASKED, suid MASKED
00:0025:00000:06378:2020/12/01 14:07:28.27 server Error: MASKED, Severity: MASKED, State: MASKED

Labels (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share the props.conf settings for that sourcetype and we'll try to help you fix the line breaking.

 

---
If this reply helps you, Karma would be appreciated.
0 Karma

habiba29
New Member

Is there anything inparticular that we would be looking for ? The current settings are as per default. Would you suggest linemerge ? Or is there a way of grouping events by time as the time stamp in the logs are not default

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Obviously, the default is not working for you so a custom props should be used.  Try these to start with.

[mysorucetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n])\d\d:
TIME_PREFIX = :\d{5}:\d{5}
TIME_FORMAT = %Y/%m/%d %H:%M:%S.%2N
MAX_TIMESTAMP_LOOKAHEAD = 23
TRUNCATE = 10000
---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Aligning Observability Costs with Business Value: Practical Strategies

 Join us for an engaging Tech Talk on Aligning Observability Costs with Business Value: Practical ...

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0

Did you know that for Splunk Enterprise 9.4, Python 3.9 is the default interpreter? This shift is not just a ...