All Apps and Add-ons

Splunk DB Connect: How to troubleshoot why a database column is incorrectly parsed and indexed in Splunk?

sinash
Explorer

I'm using a single-instance Splunk Enterprise 6.4.3 deployment on a Windows Server 2012 R2 machine with 16 cores and 12GB RAM.

I want to read from a Microsoft SQL Server database of error logs, which has a "Description" column [of type nvarchar(MAX)] that contains the description and stack trace of errors from our servers. I use Splunk DB Connect 2 to fetch data into a Splunk index.

There seems to be a problem in indexing the contents of this column.
For example, Description of many events starts with:
Object reference not set to an instance of an object.
However, some of these events are indexed with a Description of "Object (Just this one word, and mind the quotation marks!) while the others are indexed with Object reference not set to an instance of an object. (and without any quotation marks). At the meantime, when you view the actual events of both these types in the search app, they have the full contents of Description column from the source database, but differ in the contents of the Description field of the Splunk index (as explained above - for example, the first group of events have this: Description="\"Object")

This issue applies to all events, regardless of their starting line. For example, for events that their Description starts with Operation could not be completed due to state transfer..., some of them get indexed with "Operation in the Description field of the Splunk index, and the others get indexed with Operation could not be completed due to state transfer....

0 Karma
1 Solution

sinash
Explorer

I think I found out what the problem was. If the length of the Description field in the database exceeds certain amount (I think more than around 10000), the indexer will not work correctly. To test this, I changed the input query in DB Connect to only retrieve the first 500 characters of the Description field from the database (on the same set of data), and everything works fine. The length of the Description field of erroneous indexed events was way more than 10000, for example one had a length of around 17000 characters (the correctly indexed event with maximum length had a length of 10136).
I think this is a bug and relates to the maximum size of the variable type of the fields.

View solution in original post

0 Karma

sinash
Explorer

I think I found out what the problem was. If the length of the Description field in the database exceeds certain amount (I think more than around 10000), the indexer will not work correctly. To test this, I changed the input query in DB Connect to only retrieve the first 500 characters of the Description field from the database (on the same set of data), and everything works fine. The length of the Description field of erroneous indexed events was way more than 10000, for example one had a length of around 17000 characters (the correctly indexed event with maximum length had a length of 10136).
I think this is a bug and relates to the maximum size of the variable type of the fields.

View solution in original post

0 Karma

somesoni2
Revered Legend

Seems like the field extraction is not correct. Could you post your DB Connect inputs.conf entry for this? Also, On the database, is the data content same (no additional double quotes or anything) for two events with different Description values?

0 Karma

sinash
Explorer

I don't know what happened to my last comment (https://answers.splunk.com/answers/475461/incorrect-indexing-of-a-database-column.html#comment-47551...).
Yes the content is the same.
I think I found out the problem. If the length of the Description field in the database exceeds certain amount (I think more than around 10000), the indexer will not work correctly. To test this, I changed the input query in DB Connect to only retrieve the first 500 characters of the Description field from the database (on the same set of data), and everything works fine. The length of the Description field of erroneous indexed events was way more than 10000, for example one had a length of around 17000 characters (the correctly indexed event with maximum length had a length of 10136).
I think this is a bug and relates to the maximum size of the variable type of the fields.
Thank you for your response 🙂

0 Karma

sinash
Explorer

@somesoni2 On the database the contents are the same. But I think I've found out the problem. It seems that when the length of the Description of an event exceeds a certain amount (around 10000), the indexer will make a mistake. For example, one of the events of the first type ("\"Object") had a Description field length of 17000, and the correctly indexed ones had lengths no more than 10136.

To be sure of this, I read the same data into another index, and this time I changed the input query in DB Connect so that only the first 500 characters of the Description field would be selected, and there was no problem. I think increasing this number to 10000 would be safe too. Finding out the exact limit for the Description field needs either accessing the source code of Splunk (maybe the maximum size of the fields variable types are exceeded in this case) or writing a program to produce events with increasing length in the Description field.
To sum it up, I think it's a bug or at least a lack of information and documentation (on the maximum size of fields) from Splunk - or maybe I haven't seen such information yet.
Thank you for your response! 🙂

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.