Splunk Search

search optimization: getting values with tstats for fields that are sometimes NULL

wryanthomas
Communicator

Search optimization question for y’all: We have an accelerated data model to try to drive improved performance for some dashboards. It is working, but... for the one *really* large class we have, it seems to take way longer than it should for an accelerated, tstats-based search. Here is one of the tstats commands. I’m eager to see if there’s something obvious I’m missing. The main issue seems to be the values(<field>) statements immediately after the FROM clause. If I remove those, the search drops from ~80 seconds to ~8 seconds.  I.e., those values(*) items are costly. But I don’t see another way to get them in to the results… because some events don’t have those fields. (I.e., putting them in by clause will make those events disappear, which is not what we need.)  How do I get a fast return of all records and their field values?  (NOTE: In other contexts, I’ve gotten around this by adding a custom field in the data model that concatenates the fields I want, then I re-extract (rex) them after tstats line. But that seems like a ridiculous workaround … and in this case, because of the number of fields I need to return, I might as well just throw _raw in to the data model… which seems ridiculous.)

 

| tstats summariesonly=t
    values(All_Activity.sessionID)
    values(All_Activity.personID)
    values(All_Activity.chapterTitle)
    values(All_Activity.sectionName)
    values(All_Activity.behaviorDetail)
    values(All_Activity.action)
    values(All_Activity.year)
    values(All_Activity.searchTerm)
    values(All_Activity.termName)
    values(All_Activity.videoname)
    FROM datamodel="etext_behavior" 
    WHERE 
        All_Activity.bookTitle="Design Your First Year Experience 2020 edition" 
        All_Activity.course="LAS 101 Fall 2020" 
        earliest=1597714329 latest=1603071129 
    BY _time All_Activity.risingID All_Activity.course All_Activity.bookTitle span=1s

 

 

Labels (1)
0 Karma

bowesmana
Champion

@wryanthomas 

No idea if this would be useful, but for those potentially null fields can you add calculated fields to the data model to force non null data and use those calculated fields in the tstats query. Have no idea what that would do to the DM performance.

Also, out of interest, is the performance degradation linear as you add back in each of the values() clauses or does it degrade faster as you add more of the values?

 

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!