Splunk Search

Optimizing subsearch

mchipouras
Explorer

I'm trying to optimize a query joining multiple sourcetypes based on a key. I would appreciate any suggestions on how to speed it up.

I am trying to obtain a summary of the beacon-spans for an individually requestSpan.traceId filtered to a specific user. There are several beacon-spans and several alm-spans for every unique requestSpan.traceId. Only the alm-spans have the userId and subscriptionId.

The base query looks like this:

sourcetype="beacon-spans" requestSpan.stack="prod"
| join type="inner" requestSpan.traceId [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId, subscriptionId, userId]

The subsearch returns in 1-2 seconds for single hour date ranges. It usually returns less than 100 results.

The outer query has to sift through 1-2 million records and finally returns 600-800 records total, but it is taking 3-4 minutes.

I have attempted rewriting this query as a transaction and hasn't been any faster. I have read that subsearches are generally faster anyway. The key problem I am dealing with is that the key parameter that I am attempting to filter by is not available in the outer search against beacon-spans. Perhaps my problem is simply I/O related

Execution Costs

Duration (seconds)  Component                       Invocations Input   Output
    0.141   command.fields                      144     1,596,742       1,596,742
    24.691  command.join                            145     1,596,742       106
    228.654 command.search                      144     -           1,596,742
    8.85        command.search.index                    160     -           -
    3.159   command.search.fieldalias               136     1,596,742       1,596,742
    1.888   command.search.filter                   136     -           -
    0.329   command.search.calcfields               136     1,596,742       1,596,742
    0       command.search.index.usec_1_8           8,807   -           -
    0       command.search.index.usec_262144_INF        5       -           -
    0       command.search.index.usec_32768_262144  6       -           -
    0       command.search.index.usec_4096_32768    16      -           -
    0       command.search.index.usec_512_4096      7       -           -
    0       command.search.index.usec_64_512        37      -           -
    0       command.search.index.usec_8_64          1,454   -           -
    100.19  command.search.rawdata                  136     -           -
    60.46   command.search.kv                       136     -           -
    49.806  command.search.typer                    144     1,596,742       1,596,742
    1.136   command.search.tags                 144     1,596,742       1,596,742
    0.646   command.search.lookups                  136     1,596,742       1,596,742
    0.08        command.search.summary              144     -           -
    0.025   dispatch.check_disk_usage               24      -           -
    0.501   dispatch.createProviderQueue                1       -           -
    6.378   dispatch.evaluate                       1       -           -
    6.253   dispatch.evaluate.join                  1       -           -
    0.114   dispatch.evaluate.search                    1       -           -
    230.477 dispatch.fetch                          145     -           -
    0.038   dispatch.preview                        38      -           -
    0.005   dispatch.readEventsInResults                1       -           -
    4.534   dispatch.results_combiner               145     -           -
    153.311 dispatch.stream.local                   68      -           -
    75.353  dispatch.stream.remote                  76      -           2,027,392,278
    75.342  dispatch.stream.remote.qd-splunk-02     70      -           2,027,380,923
    0.005   dispatch.stream.remote.qs-splunk-02     2       -           3,785
    0.003   dispatch.stream.remote.qd-splunk-01     2       -           3,785
    0.002   dispatch.stream.remote.qs-splunk-01     2       -           3,785
    0.789   dispatch.timeline                       145     -           -
    3.351   dispatch.writeStatus                        168     -           -
    0.651   startup.handoff                         1       -           -

Here are examples of each of the JSON events

Example alm-spans

{[-]
  almBuild:11414
  javaRequestSpan:{[-]
    httpMethod:GET
    httpStatus:200
    uri:/slm/webservice/v2.x/Revision
    userAgent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
  }
  requestSpan:{[-]
    parentSpanId:e596e966-cf27-4966-9bfd-f5206562c760
    serviceName:alm
    spanId:701e1f8a-851c-489f-a922-24c418e8941e
    spanResponseTimeMs:13
    stack:prod
    timestamp:2013-10-29T15:36:07.649Z
    traceId:ff1ad4f1-0e8f-462a-9a38-716b20751607
  }
  subscriptionId:100
  subscriptionUuid:7b7b6dd2e5344ff09bdb073bb54b08b4
  userId:marc
  userUuid:16e6da3989164a8784541fcd94e9fd4f
  versionSha:1dad34e0cd3b66978ad1311d27a29aea9c9f1d1e
}

Example Beacon-span

 {[-]
      browserTimestamp:1383067227939
      componentType:ShowFlairOnWorkspaceChangeFilter
      componentUuid:4ef19064-13e7-49d2-bed3-7b8241a968f2
      endTime:18384
      eventDescription:nav filter
      eventType:load
      requestSpan:{[-]
        parentSpanId:23a5f5aa-7337-42a9-a89b-a0fa56508d8e
        serviceName:beacon
        spanId:e596e966-cf27-4966-9bfd-f5206562c760
        spanResponseTimeMs:18
        stack:prod
        timestamp:2013-10-29T15:36:07.631Z
        traceId:ff1ad4f1-0e8f-462a-9a38-716b20751607
      }
      startTime:18366
      status:Ready
      tabUuid:78dfb9bc-6bfc-428a-ab5d-92b768d6354e
      userAgent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
    }
1 Solution

mchipouras
Explorer

Converting the search to an actual sub-search made all the difference in performance for this query

sourcetype="beacon-spans" requestSpan.stack="prod" [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId]

The only negative to this query is that it does not return the values for subscriptionId and userId like the original join query did. I was able to work around this with a lookup after several other summary steps.

I really appreciate the help from sideview, cramasta and ayn

View solution in original post

mchipouras
Explorer

Converting the search to an actual sub-search made all the difference in performance for this query

sourcetype="beacon-spans" requestSpan.stack="prod" [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId]

The only negative to this query is that it does not return the values for subscriptionId and userId like the original join query did. I was able to work around this with a lookup after several other summary steps.

I really appreciate the help from sideview, cramasta and ayn

mchipouras
Explorer

Whoa this is exactly what I am looking for. I rewrote the query as an actual subsearch

sourcetype="beacon-spans" requestSpan.stack="prod" [search sourcetype="alm-spans" userId=marc | dedup requestSpan.traceId | table requestSpan.traceId]

Returned in <10 seconds for a 4hour time frame. Peforms much more quickly. Let me keep experimenting to see it it responds in larger contexts

0 Karma

sideview
SplunkTrust
SplunkTrust

Using stats will be much faster than using join, and you wont be subject to limits on the number of rows or on execution time of the search inside the square brackets. It is confusing that subsearches and search commands like join and append use square bracket syntax, and that they also have similar limits on number of rows and limits on execution time of the inner search. However square bracket syntax in join and append is not strictly speaking a "subsearch". The comment you made about "subsearches being generally faster" applies only to subsearches in the strict sense.

Try this:

(sourcetype="beacon-spans" requestSpan.stack="prod") OR (sourcetype="alm-spans" userId=marc)
| stats last(*) as * by requestSpan.traceId

I would sneak in a fields command though before the stats, so you can narrow down to just the fields you really care about. last(*) as * is going to do a lot of work.

Also you might actually want first(*) as *, but one of the two will duplicate what you've got here.

sideview
SplunkTrust
SplunkTrust

"subsearches are generally faster anyway" - subsearches can offer a speed improvement in certain kinds of use cases, but that means actual subsearches, meaning square bracket syntax appearing alone in a search command, or before the first pipe in the search expression. Although join and append use the square bracket syntax, it isn't really "subsearches". And join and append are almost always slower than some equivalent solution that just uses a search disjunction, some optional trickery usually with eval, and a stats command.

cramasta
Builder

Yes samples would help.

You could try and disable the event-typer which may speed it up a tiny bit.
http://answers.splunk.com/answers/95595/disabling-eventtypes-on-a-per-query-basis

I would try seeing if you could get what you need by using stats. Here is a ruff query that you could use as a starting point. Again would need to see more data to know if this would even come close to what you need.

(sourcetype="beacon-spans" requestSpan.stack="prod")OR (search sourcetype="alm-spans" userId=marc) | stats values(subscriptionId) values(userId) values(_raw) by
requestSpan.traceId

0 Karma

Ayn
Legend

Sample events and a description of what your desired goal is would be helpful. It's VERY often the case that you shouldn't be using join unless you really really need to.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Spunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...