Splunk Search

search optimization: getting values with tstats for fields that are sometimes NULL

wryanthomas
Contributor

Search optimization question for y’all: We have an accelerated data model to try to drive improved performance for some dashboards. It is working, but... for the one *really* large class we have, it seems to take way longer than it should for an accelerated, tstats-based search. Here is one of the tstats commands. I’m eager to see if there’s something obvious I’m missing. The main issue seems to be the values(<field>) statements immediately after the FROM clause. If I remove those, the search drops from ~80 seconds to ~8 seconds.  I.e., those values(*) items are costly. But I don’t see another way to get them in to the results… because some events don’t have those fields. (I.e., putting them in by clause will make those events disappear, which is not what we need.)  How do I get a fast return of all records and their field values?  (NOTE: In other contexts, I’ve gotten around this by adding a custom field in the data model that concatenates the fields I want, then I re-extract (rex) them after tstats line. But that seems like a ridiculous workaround … and in this case, because of the number of fields I need to return, I might as well just throw _raw in to the data model… which seems ridiculous.)

 

| tstats summariesonly=t
    values(All_Activity.sessionID)
    values(All_Activity.personID)
    values(All_Activity.chapterTitle)
    values(All_Activity.sectionName)
    values(All_Activity.behaviorDetail)
    values(All_Activity.action)
    values(All_Activity.year)
    values(All_Activity.searchTerm)
    values(All_Activity.termName)
    values(All_Activity.videoname)
    FROM datamodel="etext_behavior" 
    WHERE 
        All_Activity.bookTitle="Design Your First Year Experience 2020 edition" 
        All_Activity.course="LAS 101 Fall 2020" 
        earliest=1597714329 latest=1603071129 
    BY _time All_Activity.risingID All_Activity.course All_Activity.bookTitle span=1s

 

 

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@wryanthomas 

No idea if this would be useful, but for those potentially null fields can you add calculated fields to the data model to force non null data and use those calculated fields in the tstats query. Have no idea what that would do to the DM performance.

Also, out of interest, is the performance degradation linear as you add back in each of the values() clauses or does it degrade faster as you add more of the values?

 

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...