Hi everyone,
My client has indexes with events which are sometimes really large. The problem is that field extraction in such cases doesn't work properly. For example, opening an event shows the whole raw event, but fields below it are trimmed. If the field was a few thousands characters long, in the fields view below the event only about a thousand first characters are shown. Moreover, efforts to manipulate such fields produce unspecified results, e.g.,
| eval len_x = len(field_x)
returns 71, although the field is several thousands characters long. Searches targeting such events sometimes fail, e.g., specifying an event with an ID: event_uid=unique_id (a field-value combination present in the event) doesn't return anything, although a less specific search with the same time frame returns that event.
We also tried to tackle the problem at the source, i.e., to shorten the field with excessive length before indexing
| eval field_x = if(len(field_x) > 1000, substr(field_x, 1, 1000) . "(oversized field trimmed)", field_x)
but this only trimmed the fields, without adding the text in the brackets.
So, since I haven't managed to find it in the documentation, I would like to ask the following: is there a limit for the field length and does it depend on the overall event size? How to deal with such long fields?
Thanks and kind regards,
Krunoslav Ivesic
Hi
is this happening when you are using rex on SPL or is it happening in index time with props and transforms.conf files? If later on then you should use LOOKAHEAD on props.conf.
r. Ismo
This happens in search time, the data is indexed with TRUNCATE=0 and whole _raw events are kept. The problem is visible in the GUI, when en event is opened and the fields are shown below it, or when a command like "table *" is invoked at the end of search: overly size fields just get trimmed, which becomes apparent after comparison with _raw.
Is the same issue if you are running splunk search on cli or only in GUI?
Which splunk version and which OS it is running?
Can you show some example of data and how it is extracted and what it should be? Can there be a minor and/or major breaker on those places? See e.g.
I tried to replicate the setup by copying the apps to my local Splunk instance (where I have access to the _internal index and CLI, not the case at the client) and I had the same issue in CLI as in GUI. In fact, since most (if not all) fields are indexed , I found this in the log in my local instance:
08-19-2022 16:31:23.210 +0200 WARN MetaData::string [407284 indexerPipe] - Received metadata string exceeding maxLength length=25029 maxLength=1000 -- it will be truncated as an indexed extraction. Only searches scanning _raw will be able to find it as a whole word.\n 2 similar messages suppressed. First occurred at: Fri Aug 19 16:07:38 2022
So, it seems that we are going way above some internal limit. It would be great if there was a way to override this value.
Splunk version is 8.2.6. I haven't yet figured out which OS (I think it's something Debian-based, though). Unfortunately I'm not allowed to paste the sample data, but it's not about segmenters, I checked that. In fact, a field is cut in the middle of a word sometimes.
Have you look this https://community.splunk.com/t5/Monitoring-Splunk/Received-metadata-string-exceeding-maxLength-warni... ?
I agree with @somesoni2 that using so long field as indexed is probably not the best idea.