Splunk Search

Get the context of an error

pfs
Engager

Hi Splunk Community!

I'm trying to get the context of an error.

Here is a snippet of the logs:

 

 

2021-03-21 11:36:43,045 [thread-1] blablabla orderid 12345
2021-03-21 11:36:43,045 [thread-2] blablabla orderid 23456
2021-03-21 11:36:43,045 [thread-3] blablabla orderid 34567
2021-03-21 11:36:43,046 [thread-1] blablabla
...
2021-03-21 11:36:43,047 [thread-1] WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - SQL Error: 1366, SQLState: HY000
2021-03-21 11:36:43,048 [thread-1] ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - Incorrect string value: '\xE2\x80\xAFfro...' for column 'request' at row 1
2021-03-21 11:36:43,050 [thread-1] ERROR [class-1] - org.hibernate.exception.GenericJDBCException: could not execute statement
javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not execute statement
<multi-line stack trace>
...

 

 

The "context" I'm trying to get is:

>> For orderid 12345, error "SQL Error: 1366, SQLState: HY000" while trying to write '\xE2\x80\xAFfro...' for column 'request' in "class-1".

As you can see, the order and the error messages are all on different lines.  I know, it's not ideal, but that's what I have to deal with right now.

Is there a way to get this summary?

My idea is:

  1. find the error
  2. get the thread name
  3. find logs with the same thread name in the past few seconds to get the orderid
  4. find logs with the same thread name in the next few seconds to get the character(s), the column name and the class name

Something to keep in mind on our setup:

  • the "_time" is the indexed time, not the real log time.  Because of that, sometimes there are logs from the previous day having the same "_time" as logs from the current day. So, I've used extracted fields to get the date/time from the log entry.
  • I have been able to extract fields for the date/time (rm_datetime), the thread name (rm_threadname), the log message (rm_logmessage)

I've tried using:

  • transaction to get the next line grouped by the rm_threadname, but transaction can only go in 1 direction, i.e. either up or down. In this case, I'm looking for "SQL Error: 1366". So, I have to walk up and down to get the full context.

 

 

index=main | transaction rm_threadname startswith="[org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - Incorrect string value:" maxevents=5 | rex field=_raw "(?<FirstFewLines>(.*[\n]){2})" | table FirstFewLines​

 

 

  • map based on the rm_threadname of the previous search, but I'm failing to get log entries around the rm_datetime of the error (a few seconds before and a few seconds after)

 

 

index=main "SQL Error: 1366"
| eval errordateserial=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q"), fromdateserial=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q") - 2, todateserial=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q") + 1
| table rm_datetime, rm_threadname, rm_logmessage, errordateserial, fromdateserial, todateserial
| map [ search index=main rm_threadname=$rm_threadname$
  | eval datetime=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q")
  | eval datetime >= $fromdateserial$ | eval datetime <= $todateserial$
  | eval errordateserial=$errordateserial$, fromdateserial=$fromdateserial$, todateserial=$todateserial$
]​

 

 

  • subsearches using the rm_threadname, but I'm failing to get log entries around the rm_datetime of the error (a few seconds before and a few seconds after)

 

 

index=main [ search index=main "SQL Error: 1366" | fields rm_datetime, rm_threadname | format ] ​

 

 

What am I missing?

Thanks in advance.

Labels (3)
0 Karma
1 Solution

ITWhisperer
Ultra Champion

Yes, so make sure the events are ordered correctly, then use transaction to pick up the start i.e. when the order id is logged, and all events for the thread id until it get restarted, then search for the error

| transaction rm_threadname startswith="blablabla"
| search "SQL Error: 1366"

View solution in original post

0 Karma

pfs
Engager

Thx @ITWhisperer !

I have tried some subsearches but it was getting too complex and was even slower than the transaction.

The grouping with transaction with a limited duration (maxspan) and limited number of events is the best compromise.  It currently does take ~10min per search per day.

My final query is something like this:

index=main
| transaction rm_threadname maxevents=40 maxspan=2s
| search "SQL Error: 1366" "Incorrect string value" "exception" ("currentOrderNumber" OR "orderid" OR "ordernumber")
| ...do some field extraction
| table ...all the relevant fields...
0 Karma

ITWhisperer
Ultra Champion

Have you tried 

| transaction rm_threadname startswith="blablabla orderid" endswith="Incorrect string value"
0 Karma

pfs
Engager

Thanks for replying @ITWhisperer .

As I mentioned, transaction can only go in 1 direction (as far as I know).

In this case, it will return me the orderid (which, by the way, can have many variations, like order id, order.id, order number, etc) with a block of logs, up to the "Incorrect string value", but it will not return me the class name "class-1" in the next line after the "Incorrect string value".

Am I asking for too much?

0 Karma

ITWhisperer
Ultra Champion

Yes, so make sure the events are ordered correctly, then use transaction to pick up the start i.e. when the order id is logged, and all events for the thread id until it get restarted, then search for the error

| transaction rm_threadname startswith="blablabla"
| search "SQL Error: 1366"

View solution in original post

0 Karma

pfs
Engager

Interesting.  I was very skeptical about that query, but it kinda worked.  It took more than 3 minutes for 1 day of logs though.

It returned

2021-03-21 11:36:43,045 [thread-1] blablabla orderid 12345
2021-03-21 11:36:43,046 [thread-1] blablabla
...
2021-03-21 11:36:43,047 [thread-1] WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - SQL Error: 1366, SQLState: HY000
2021-03-21 11:36:43,048 [thread-1] ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - Incorrect string value: '\xE2\x80\xAFfro...' for column 'request' at row 1
2021-03-21 11:36:43,050 [thread-1] ERROR [class-1] - org.hibernate.exception.GenericJDBCException: could not execute statement
javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not execute statement
<multi-line stack trace>

 

The threadname is recycled in the web server. So, just grouping by the threadname is going to be a very expensive search. My idea here was to try to limit the search to around the error only. But maybe I'm trying too hard.

Any suggestion to speed it up?

Also, I don't need all the details by threadname. Is there a way to summarize the data into a tabular format with the few data points I need? For instance, orderid, the "Incorrect string value" line, the classname after the "Incorrect string value" line?

datetimethreadnameorderidcolumn nameclassname
2021-03-21 11:36:43,047thread-112345requestclass-1
0 Karma

ITWhisperer
Ultra Champion

If it were me, I would try to extract the information you need using rex. The transaction command will have gathered all the _raw fields together (unless you have specifically chosen to create multi-value fields) so you might even be able to do it in one long rex expression, but it doesn't matter if you use more than one.

0 Karma

pfs
Engager

Thanks for your input @ITWhisperer .

I used rex to find the few fields I wanted.  It's a bit clumsy, but it works.

The transaction command is somewhat slow.  Any suggestions on how to speed it up?

0 Karma

ITWhisperer
Ultra Champion

You could set up some scheduled searches (say once an hour) to populate a summary index with the results you are after, then use that index.

There might be something you could do with streamstats to copy the latest fields e.g. order id down to other events with the same thread id, but there is no guarantee that this would be any quicker if you have a large amount of data in the first place.

0 Karma