Splunk Search

How do I deal with performance issues using the transaction command with a large amount of data?

jlo
New Member

I have a new application that I need to extract one field, taskname, from the main task table (5+ million records) and merge into the subtask table (less 1 million records). When I ran with the 'transaction' statement, I got incomplete results.

Here are my file structures. One main task will have multiple subtasks and each subtask might run in serialized or parallel.

Main  taskA (mID, jID, mCPU, taskname, main task beginning/ending time)
  subtask1: mID, jID, sCPU, subtask beginning/ending time  
  subtask2: mID, jID, sCPU, subtask beginning/ending time 
  . . . . .

I need to search each detailed subtask attributes by task name across different time frame. And, another search will be summarized by time distribution. I know the later one can be achieved with | timechart sum(sCPU) by taskname span=1h. But, I got the performance issue for the first search. I did search to use summary indexing and post processing. It seems they suggest to aggregate fields mID and jID. However, I will lose the subtask detailed information. Any suggestion?

0 Karma

jplumsdaine22
Influencer

I find it helps to visualise Splunk as a single giant database table with unlimited columns. So in fact your events are already "joined" and all you need to do is group statistics together in a meaningful way.

Assuming each mID,jID pair is unique, you can group all you fields together like so

index=performance (source=subtask OR source=maintask ) | stats values(*) AS * by mID,jID

This should give you a table of all the fields that belong to each mID jID pair. The fields will be multivalued, so you may need to split them with mvexpand

There is a great flowchart on when you use each aggregation command here: http://docs.splunk.com/Documentation/Splunk/6.3.2/Search/Abouteventcorrelation

See Stats: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Stats
See Mvexpand: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Mvexpand

0 Karma

Murali2888
Communicator

Can you post the query with transaction command that you are using?
Is your main task table and subtask table are different files?

0 Karma

jlo
New Member

the base search,

index=performance source=subtask OR source=maintask
| transaction mID jID startswith=(source="subtask") endswith=(source="maintask")

....

use the search base for the query1

| timechart sum(sCPU) by taskname span=1h

....

use the search base for the query2

| table mID jID sCPU taskname sATT1 sATT2 sATT3
| sort 0 -sCPU | head 100

where sATTx are minor test attributes from the source=subtask; and taskname is from source=maintask.

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...