I'm trying to optimize a query joining multiple sourcetypes based on a key. I would appreciate any suggestions on how to speed it up.
I am trying to obtain a summary of the beacon-spans for an individually requestSpan.traceId filtered to a specific user. There are several beacon-spans and several alm-spans for every unique requestSpan.traceId. Only the alm-spans have the userId and subscriptionId.
The base query looks like this:
sourcetype="beacon-spans" requestSpan.stack="prod"
| join type="inner" requestSpan.traceId [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId, subscriptionId, userId]
The subsearch returns in 1-2 seconds for single hour date ranges. It usually returns less than 100 results.
The outer query has to sift through 1-2 million records and finally returns 600-800 records total, but it is taking 3-4 minutes.
I have attempted rewriting this query as a transaction and hasn't been any faster. I have read that subsearches are generally faster anyway. The key problem I am dealing with is that the key parameter that I am attempting to filter by is not available in the outer search against beacon-spans. Perhaps my problem is simply I/O related
Execution Costs
Duration (seconds) Component Invocations Input Output
0.141 command.fields 144 1,596,742 1,596,742
24.691 command.join 145 1,596,742 106
228.654 command.search 144 - 1,596,742
8.85 command.search.index 160 - -
3.159 command.search.fieldalias 136 1,596,742 1,596,742
1.888 command.search.filter 136 - -
0.329 command.search.calcfields 136 1,596,742 1,596,742
0 command.search.index.usec_1_8 8,807 - -
0 command.search.index.usec_262144_INF 5 - -
0 command.search.index.usec_32768_262144 6 - -
0 command.search.index.usec_4096_32768 16 - -
0 command.search.index.usec_512_4096 7 - -
0 command.search.index.usec_64_512 37 - -
0 command.search.index.usec_8_64 1,454 - -
100.19 command.search.rawdata 136 - -
60.46 command.search.kv 136 - -
49.806 command.search.typer 144 1,596,742 1,596,742
1.136 command.search.tags 144 1,596,742 1,596,742
0.646 command.search.lookups 136 1,596,742 1,596,742
0.08 command.search.summary 144 - -
0.025 dispatch.check_disk_usage 24 - -
0.501 dispatch.createProviderQueue 1 - -
6.378 dispatch.evaluate 1 - -
6.253 dispatch.evaluate.join 1 - -
0.114 dispatch.evaluate.search 1 - -
230.477 dispatch.fetch 145 - -
0.038 dispatch.preview 38 - -
0.005 dispatch.readEventsInResults 1 - -
4.534 dispatch.results_combiner 145 - -
153.311 dispatch.stream.local 68 - -
75.353 dispatch.stream.remote 76 - 2,027,392,278
75.342 dispatch.stream.remote.qd-splunk-02 70 - 2,027,380,923
0.005 dispatch.stream.remote.qs-splunk-02 2 - 3,785
0.003 dispatch.stream.remote.qd-splunk-01 2 - 3,785
0.002 dispatch.stream.remote.qs-splunk-01 2 - 3,785
0.789 dispatch.timeline 145 - -
3.351 dispatch.writeStatus 168 - -
0.651 startup.handoff 1 - -
Here are examples of each of the JSON events
Example alm-spans
{[-]
almBuild:11414
javaRequestSpan:{[-]
httpMethod:GET
httpStatus:200
uri:/slm/webservice/v2.x/Revision
userAgent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
}
requestSpan:{[-]
parentSpanId:e596e966-cf27-4966-9bfd-f5206562c760
serviceName:alm
spanId:701e1f8a-851c-489f-a922-24c418e8941e
spanResponseTimeMs:13
stack:prod
timestamp:2013-10-29T15:36:07.649Z
traceId:ff1ad4f1-0e8f-462a-9a38-716b20751607
}
subscriptionId:100
subscriptionUuid:7b7b6dd2e5344ff09bdb073bb54b08b4
userId:marc
userUuid:16e6da3989164a8784541fcd94e9fd4f
versionSha:1dad34e0cd3b66978ad1311d27a29aea9c9f1d1e
}
Example Beacon-span
{[-]
browserTimestamp:1383067227939
componentType:ShowFlairOnWorkspaceChangeFilter
componentUuid:4ef19064-13e7-49d2-bed3-7b8241a968f2
endTime:18384
eventDescription:nav filter
eventType:load
requestSpan:{[-]
parentSpanId:23a5f5aa-7337-42a9-a89b-a0fa56508d8e
serviceName:beacon
spanId:e596e966-cf27-4966-9bfd-f5206562c760
spanResponseTimeMs:18
stack:prod
timestamp:2013-10-29T15:36:07.631Z
traceId:ff1ad4f1-0e8f-462a-9a38-716b20751607
}
startTime:18366
status:Ready
tabUuid:78dfb9bc-6bfc-428a-ab5d-92b768d6354e
userAgent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
}
Converting the search to an actual sub-search made all the difference in performance for this query
sourcetype="beacon-spans" requestSpan.stack="prod" [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId]
The only negative to this query is that it does not return the values for subscriptionId and userId like the original join query did. I was able to work around this with a lookup after several other summary steps.
I really appreciate the help from sideview, cramasta and ayn
Converting the search to an actual sub-search made all the difference in performance for this query
sourcetype="beacon-spans" requestSpan.stack="prod" [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId]
The only negative to this query is that it does not return the values for subscriptionId and userId like the original join query did. I was able to work around this with a lookup after several other summary steps.
I really appreciate the help from sideview, cramasta and ayn
Whoa this is exactly what I am looking for. I rewrote the query as an actual subsearch
sourcetype="beacon-spans" requestSpan.stack="prod" [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId]
Returned in <10 seconds for a 4hour time frame. Peforms much more quickly. Let me keep experimenting to see it it responds in larger contexts
Using stats will be much faster than using join, and you wont be subject to limits on the number of rows or on execution time of the search inside the square brackets. It is confusing that subsearches and search commands like join and append use square bracket syntax, and that they also have similar limits on number of rows and limits on execution time of the inner search. However square bracket syntax in join and append is not strictly speaking a "subsearch". The comment you made about "subsearches being generally faster" applies only to subsearches in the strict sense.
Try this:
(sourcetype="beacon-spans" requestSpan.stack="prod") OR (sourcetype="alm-spans" userId=marc)
| stats last(*) as * by requestSpan.traceId
I would sneak in a fields command though before the stats, so you can narrow down to just the fields you really care about. last(*) as * is going to do a lot of work.
Also you might actually want first(*) as *
, but one of the two will duplicate what you've got here.
"subsearches are generally faster anyway" - subsearches can offer a speed improvement in certain kinds of use cases, but that means actual subsearches, meaning square bracket syntax appearing alone in a search command, or before the first pipe in the search expression. Although join and append use the square bracket syntax, it isn't really "subsearches". And join and append are almost always slower than some equivalent solution that just uses a search disjunction, some optional trickery usually with eval, and a stats command.
Yes samples would help.
You could try and disable the event-typer which may speed it up a tiny bit.
http://answers.splunk.com/answers/95595/disabling-eventtypes-on-a-per-query-basis
I would try seeing if you could get what you need by using stats. Here is a ruff query that you could use as a starting point. Again would need to see more data to know if this would even come close to what you need.
(sourcetype="beacon-spans" requestSpan.stack="prod")OR (search sourcetype="alm-spans" userId=marc) | stats values(subscriptionId) values(userId) values(_raw) by
requestSpan.traceId
Sample events and a description of what your desired goal is would be helpful. It's VERY often the case that you shouldn't be using join unless you really really need to.