I have a new application that I need to extract one field, taskname, from the main task table (5+ million records) and merge into the subtask table (less 1 million records). When I ran with the 'transaction' statement, I got incomplete results.
Here are my file structures. One main task will have multiple subtasks and each subtask might run in serialized or parallel.
Main taskA (mID, jID, mCPU, taskname, main task beginning/ending time)
subtask1: mID, jID, sCPU, subtask beginning/ending time
subtask2: mID, jID, sCPU, subtask beginning/ending time
. . . . .
I need to search each detailed subtask attributes by task name across different time frame. And, another search will be summarized by time distribution. I know the later one can be achieved with | timechart sum(sCPU) by taskname span=1h
. But, I got the performance issue for the first search. I did search to use summary indexing and post processing. It seems they suggest to aggregate fields mID and jID. However, I will lose the subtask detailed information. Any suggestion?
I find it helps to visualise Splunk as a single giant database table with unlimited columns. So in fact your events are already "joined" and all you need to do is group statistics together in a meaningful way.
Assuming each mID,jID pair is unique, you can group all you fields together like so
index=performance (source=subtask OR source=maintask ) | stats values(*) AS * by mID,jID
This should give you a table of all the fields that belong to each mID jID pair. The fields will be multivalued, so you may need to split them with mvexpand
There is a great flowchart on when you use each aggregation command here: http://docs.splunk.com/Documentation/Splunk/6.3.2/Search/Abouteventcorrelation
See Stats: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Stats
See Mvexpand: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Mvexpand
Can you post the query with transaction
command that you are using?
Is your main task table and subtask table are different files?
the base search,
index=performance source=subtask OR source=maintask
| transaction mID jID startswith=(source="subtask") endswith=(source="maintask")
....
use the search base for the query1
| timechart sum(sCPU) by taskname span=1h
....
use the search base for the query2
| table mID jID sCPU taskname sATT1 sATT2 sATT3
| sort 0 -sCPU | head 100
where sATTx are minor test attributes from the source=subtask; and taskname is from source=maintask.