Getting Data In

TIMESTAMP_FIELDS vs INDEXED_EXTRACTIONS vs KV_MODE

yuanliu
SplunkTrust
SplunkTrust

Context is structured sourcetypes such as JSON.  First, Does use of TIMESTAMP_FIELDS require INDEXED_EXTRACTIONS? (The Web UI suggests so.)

In Bug: Duplicate values with INDEXED_EXTRACTION?@badrinath_itrs referred to an intense case study The Indexed Extractions vs. Search-Time Extractions Splunk Case Study regarding INDEXED_EXTRACTIONS:

To summarize, Indexed Extractions should be used with caution. Splunk gives a pretty fair warning against using them in almost any doc that references Indexed Extractions, including their definition on Splexicon.

Then, I realized that for JSON documents whose timestamp fields falls beyond 128 characters, it is better to set INDEXED_EXTRACTIONS=json in conjunction with TIMESTAMP_FIELDS. (There is an index-time penalty to set MAX_TIMESTAMP_LOOKAHEAD too large.)

INDEXED_EXTRACTIONS=json then causes duplicate values at search time unless KV_MODE is set to none on search head.  Given Splunk's extraordinary search time capabilities, if I can use TIMESTAMP_FIELDS in conjunction with INDEXED_EXTRACTIONS=none, the problem would be solved without touching KV_MODE.  Is this possible?

Secondly, because INDEXED_EXTRACTIONS=json nearly demands use of KV_MODE=none, wouldn't it be useful for the Web GUI to automatically set KV_MODE=none when "Indexed Extractions" selector points to a structured sourcetype?  The user can still override in Advanced view, but the presence of this default can save lots of headaches for people like me.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think you've made the case for not using TIMESTAMP_FIELDS when using INDEXED_EXTRACTIONS.  That leaves you with TIME_PREFIIX as the way to tell Splunk where the timestamp is.

---
If this reply helps you, Karma would be appreciated.
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Thanks for the suggestion, @richgalloway. I did briefly look into TIME_PREFIX, but reasoned against it because prefixing texts (even with regex) in structured data feels awkward. Not only is this less elegant (not quite in aesthetics, but in "let the server do what it does best" - extract structured data), but it is more difficult to document, and in a way the regex has to anticipate possible JSON formatting variants - again, a job that the indexer does best.

Maybe I need to take a second look at this assessment.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...