Hi Splunk Community!
I'm trying to get the context of an error.
Here is a snippet of the logs:
2021-03-21 11:36:43,045 [thread-1] blablabla orderid 12345
2021-03-21 11:36:43,045 [thread-2] blablabla orderid 23456
2021-03-21 11:36:43,045 [thread-3] blablabla orderid 34567
2021-03-21 11:36:43,046 [thread-1] blablabla
...
2021-03-21 11:36:43,047 [thread-1] WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - SQL Error: 1366, SQLState: HY000
2021-03-21 11:36:43,048 [thread-1] ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - Incorrect string value: '\xE2\x80\xAFfro...' for column 'request' at row 1
2021-03-21 11:36:43,050 [thread-1] ERROR [class-1] - org.hibernate.exception.GenericJDBCException: could not execute statement
javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not execute statement
<multi-line stack trace>
...
The "context" I'm trying to get is:
>> For orderid 12345, error "SQL Error: 1366, SQLState: HY000" while trying to write '\xE2\x80\xAFfro...' for column 'request' in "class-1".
As you can see, the order and the error messages are all on different lines. I know, it's not ideal, but that's what I have to deal with right now.
Is there a way to get this summary?
My idea is:
Something to keep in mind on our setup:
I've tried using:
index=main | transaction rm_threadname startswith="[org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - Incorrect string value:" maxevents=5 | rex field=_raw "(?<FirstFewLines>(.*[\n]){2})" | table FirstFewLines
index=main "SQL Error: 1366"
| eval errordateserial=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q"), fromdateserial=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q") - 2, todateserial=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q") + 1
| table rm_datetime, rm_threadname, rm_logmessage, errordateserial, fromdateserial, todateserial
| map [ search index=main rm_threadname=$rm_threadname$
| eval datetime=strptime(rm_datetime, "%Y-%m-%d %H:%M:%S,%Q")
| eval datetime >= $fromdateserial$ | eval datetime <= $todateserial$
| eval errordateserial=$errordateserial$, fromdateserial=$fromdateserial$, todateserial=$todateserial$
]
index=main [ search index=main "SQL Error: 1366" | fields rm_datetime, rm_threadname | format ]
What am I missing?
Thanks in advance.
Yes, so make sure the events are ordered correctly, then use transaction to pick up the start i.e. when the order id is logged, and all events for the thread id until it get restarted, then search for the error
| transaction rm_threadname startswith="blablabla"
| search "SQL Error: 1366"
Thx @ITWhisperer !
I have tried some subsearches but it was getting too complex and was even slower than the transaction.
The grouping with transaction with a limited duration (maxspan) and limited number of events is the best compromise. It currently does take ~10min per search per day.
My final query is something like this:
index=main
| transaction rm_threadname maxevents=40 maxspan=2s
| search "SQL Error: 1366" "Incorrect string value" "exception" ("currentOrderNumber" OR "orderid" OR "ordernumber")
| ...do some field extraction
| table ...all the relevant fields...
Have you tried
| transaction rm_threadname startswith="blablabla orderid" endswith="Incorrect string value"
Thanks for replying @ITWhisperer .
As I mentioned, transaction can only go in 1 direction (as far as I know).
In this case, it will return me the orderid (which, by the way, can have many variations, like order id, order.id, order number, etc) with a block of logs, up to the "Incorrect string value", but it will not return me the class name "class-1" in the next line after the "Incorrect string value".
Am I asking for too much?
Yes, so make sure the events are ordered correctly, then use transaction to pick up the start i.e. when the order id is logged, and all events for the thread id until it get restarted, then search for the error
| transaction rm_threadname startswith="blablabla"
| search "SQL Error: 1366"
Interesting. I was very skeptical about that query, but it kinda worked. It took more than 3 minutes for 1 day of logs though.
It returned
2021-03-21 11:36:43,045 [thread-1] blablabla orderid 12345
2021-03-21 11:36:43,046 [thread-1] blablabla
...
2021-03-21 11:36:43,047 [thread-1] WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - SQL Error: 1366, SQLState: HY000
2021-03-21 11:36:43,048 [thread-1] ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] - Incorrect string value: '\xE2\x80\xAFfro...' for column 'request' at row 1
2021-03-21 11:36:43,050 [thread-1] ERROR [class-1] - org.hibernate.exception.GenericJDBCException: could not execute statement
javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not execute statement
<multi-line stack trace>
The threadname is recycled in the web server. So, just grouping by the threadname is going to be a very expensive search. My idea here was to try to limit the search to around the error only. But maybe I'm trying too hard.
Any suggestion to speed it up?
Also, I don't need all the details by threadname. Is there a way to summarize the data into a tabular format with the few data points I need? For instance, orderid, the "Incorrect string value" line, the classname after the "Incorrect string value" line?
datetime | threadname | orderid | column name | classname |
2021-03-21 11:36:43,047 | thread-1 | 12345 | request | class-1 |
If it were me, I would try to extract the information you need using rex. The transaction command will have gathered all the _raw fields together (unless you have specifically chosen to create multi-value fields) so you might even be able to do it in one long rex expression, but it doesn't matter if you use more than one.
Thanks for your input @ITWhisperer .
I used rex to find the few fields I wanted. It's a bit clumsy, but it works.
The transaction command is somewhat slow. Any suggestions on how to speed it up?
You could set up some scheduled searches (say once an hour) to populate a summary index with the results you are after, then use that index.
There might be something you could do with streamstats to copy the latest fields e.g. order id down to other events with the same thread id, but there is no guarantee that this would be any quicker if you have a large amount of data in the first place.