Splunk Search

How to optimize this timechart query

klim
Path Finder

I have this query index=some_index | timechart limit=15 useOther=false count by acct_id and it needs to run up to a time period of one month. The current time it takes to run is very long and the amount of events it looks at is around 70 million a day. I could accelerate the report but even then it takes awhile to complete the chart even when it says it's scanned 100% of the time period. 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Have you considered creating a summary index based on the counts for each day, then make your query a sum of the daily counts over the month?

0 Karma

klim
Path Finder

I've considered that but the time range that could be selected could be the last 15 minutes or 15 minute range on a certain date.

0 Karma

tscroggins
Influencer

It's still an excellent suggestion, but you can summarize over smaller time ranges. If your summary populating search runs every ~15 minutes, you're counting by acct_id over ~730,000 events. When a user needs to aggregate over larger time ranges, they can search the summary index. The available precision will just be limited to your aggregation window.

If you're running Splunk 8+ and your raw events contain acct_id=x, you can also take advantage of PREFIX in tstats:

| tstats count where index=some_index TERM(acct_id=*) by PREFIX(acct_id=) _time span=1d

Check out Richard Morgan's excellent "TSTATS and PREFIX" presentation from .conf20 at https://conf.splunk.com/files/2020/slides/PLA1089C.pdf.

0 Karma

klim
Path Finder

@tscroggins thanks for the reply. I'm pretty sure I'm using splunk 8 right now but the tstats command you provided returns three columns acct_id, _time, and count. acct_id is empty though so maybe the PREFIX(acct_id) isn't working correctly. The acct_id contains only dashes so it is a minor breaker.

0 Karma

tscroggins
Influencer

If the events look something like this:

Sat Feb 20 20:13:00 EST 2021 acct_id=123

your tstats search could look like this:

| tstats count where index=foo TERM(acct_id=*) by PREFIX(acct_id=)

The PREFIX() function must contain the entire string prefix, including the "=" character.

If the events look something like this, and you're using search time field exaction:

Sat Feb 20 20:13:00 EST 2021 123

then you need to fall back to search time aggregations. If you plan to summarize data, compare the "cost" of summarizing with the cost of simply indexing the acct_id field in the primary index.

0 Karma

klim
Path Finder

@tscroggins 

I attempted using the tstats command you mentioned. Does it matter where acct_id field is located in the event or does it have to be the first field after the time like in your example?

This is what my event currently looks like:

20210221 01:19:04.554 UTC INFO core field=some_value field1=some_value1 field2=some_value2  acct_id="123-123-123"

0 Karma

tscroggins
Influencer

@klim 

The double quote is a major breaker, so PREFIX won't work.

You might consider extracting and indexing the acct_id field, but it won't help with already indexed events.

0 Karma
Get Updates on the Splunk Community!

AppDynamics Summer Webinars

This summer, our mighty AppDynamics team is cooking up some delicious content on YouTube Live to satiate your ...

SOCin’ it to you at Splunk University

Splunk University is expanding its instructor-led learning portfolio with dedicated Security tracks at .conf25 ...

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Organizations handling credit card transactions know that PCI DSS compliance is both critical and complex. The ...