Solved: How To Accelerate a Search?

skoelpin · ‎07-21-2015

I have search inside a dashboard which shows a table of IP Address | JSession Count | Browser | Web Request.. I have well over a million rows within my table and I included a search box which the user will search for an IP and it will return the other data I listed above from the table. Now I need to accelerate this process so the historical data is cached and can be loaded up quickly.

I tried turning my search into a report and accelerating it but got the " Your report doesn't qualify for acceleration" message.. I removed the timerange picker and tried again and got the same message. The only thing I can think of is, that it's due to not having a transform command, but I am confused as I have a STATS command in my search. If all else fails can I accelerate the data model?

Here's my search

index=access OR index=main | transaction RTG_JSession | search RTG_IPmain=* | table RTG_IPmain RTG_WebRequest RTG_Browser | stats count values(RTG_Browser) values(RTG_WebRequest) BY RTG_IPmain | sort -count | rename RTG_IPmain AS "IP Address" | rename  count AS "JSession Count" | rename "values(RTG_Browser)" AS "Browser" | rename "values(RTG_WebRequest)" AS "Web Request"

martin_mueller · ‎07-22-2015

Two comments combined for @lguinn 😉

This sounds like the pattern "subsearch in one sourcetype for IP, return session ID to search in all sourcetypes, build transaction" to me:

(index=access OR index=main) [search index=access IP="whatever the user types into the form" | dedup RTG_JSession | fields RTG_JSession] | transaction RTG_JSession | ...

The subsearch pulls all session IDs matching the IP you're looking for, then you use that to search all events for those sessions... giving you much much smaller result sets. Then you apply the transaction on small data, running quickly.

View solution in original post

martin_mueller · ‎07-22-2015

Two comments combined for @lguinn 😉

This sounds like the pattern "subsearch in one sourcetype for IP, return session ID to search in all sourcetypes, build transaction" to me:

(index=access OR index=main) [search index=access IP="whatever the user types into the form" | dedup RTG_JSession | fields RTG_JSession] | transaction RTG_JSession | ...

The subsearch pulls all session IDs matching the IP you're looking for, then you use that to search all events for those sessions... giving you much much smaller result sets. Then you apply the transaction on small data, running quickly.

martin_mueller · ‎07-22-2015

You're again filtering after the transaction!!

  (index=access OR index=main) [search index=access $ip$ | dedup RTG_JSession | fields RTG_JSession]
| transaction RTG_JSession | stats ...

skoelpin · ‎07-22-2015

I see what you're saying, it would be much faster to take a chunk from the large pool then do a transaction which is much slower across that smaller pool to make the search faster.

*The token I used in my form which is also in my search is the $ip$ and your search example says to include this in the index=access subsearch.. When I defined the token in my form input, it was taken from the RTG_IPmain which is from the main index.

Can you help with making my search work correctly? This is what I have so far ( I stripped off the sort and renames to make it easier to read)

(index=access OR index=main) [search index=access | stats count values(RTG_Browser) values(RTG_WebRequest)]  | transaction RTG_JSession | search $ip$ | stats count values(RTG_Browser) values(RTG_WebRequest) BY RTG_IPmain | table RTG_IPmain RTG_WebRequest RTG_Browser

lguinn2 · ‎07-21-2015

A search that contains a transaction command cannot be accelerated. Perhaps something like the following would work...

index=access OR index=main 
| stats values(RTG_IPmain) values(RTG_WebRequest) as RTG_WebRequest values(RTG_Browser) as RTG_Browser by RTG_JSession RTG_IPmain RTJ
| sort -count 
| rename RTG_IPmain AS "IP Address" 
| rename  count AS "JSession Count" 
| rename "values(RTG_Browser)" AS "Browser" 
| rename "values(RTG_WebRequest)" AS "Web Request"

I probably over-simplified somewhere, but this search can be accelerated. If this isn't enough to get you started, could you show us a (scrubbed) sample of your data? Also, if you could describe, in general, what you are trying to do, that will help the community figure out alternate solutions.

martin_mueller · ‎07-22-2015

This sounds like the pattern "subsearch in one sourcetype for IP, return session ID to search in all sourcetypes, build transaction" to me.

skoelpin · ‎07-22-2015

Yes very close to what I'm doing. First it will search both indexes and group events that have the same JSession ID. It will then return the events. The events returned will have the true client IP from index=main with the Web Request and Browser from index=access within 1 event. So basically combining events from multiple indexes. The Jsession ID is only used to match events together which may not have the same IP Address

lguinn2 · ‎07-22-2015

Hey @martin_mueller - can you post an answer that contains your pattern (or convert the comment to an answer) - I'd love to make it more visible and vote it up.

@skoelpin - I think Martin's solution will be much faster, even though it has a subsearch and a transaction! First, the subsearch is actually quite small: it only returns the JSession ID associated with the user's choice of IP address. It will run quickly and then constrain the base search. So it filters before the transaction, so Splunk only has to make transactions from a limited set of data.

You won't be able to accelerate this search, but you may not need to.

martin_mueller · ‎07-22-2015

Done......

martin_mueller · ‎07-22-2015

If you first apply a transaction and then filter it'll be a million times slower than using the pattern I described.

(index=access OR index=main) [search index=access IP="whatever the user types into the form" | dedup RTG_JSession | fields RTG_JSession] | transaction RTG_JSession | ...

skoelpin · ‎07-22-2015

I like your thinking on this and I agree. The transaction command is killing my performance, but it looks like you have a subsearch then pipe that output into the transaction command. Wouldn't a subsearch kill the performance just like having the Transaction command at the beginning?

skoelpin · ‎07-22-2015

Hello Iguinn, thanks for the reply.

Let me explain what I'm trying to do and maybe there's a better way to go about it. I'm working on a click fraud project for the marketing group. We need to identify what web requests and browser come from a true clients IP and if they are committing click fraud.

I have 2 indexes (main and access). The main index has the true client IP while the access index has the IP located in the Browser header which may not be the true client IP. The access index has the web request info, Browser and the JSession ID. So I had to create a table which would show the true clients IP address from index=main and the other info which came from index=access. So since both indexes has one thing in common (JSession ID) I tied them together with a transaction command and showed results from both indexes.

So now since the marketing people who will need to detect click fraud have zero Splunk knowledge, I made a dashboard where they can enter the IP Address or Web Request into a search box and the table will filter all results which include whatever the user entered.

martin_mueller · ‎07-21-2015

transaction can be a pain to make fast over large sets of data... where in that search are you putting the filters into that you get from the form?

skoelpin · ‎07-22-2015

Thanks for the input martin_mueller. I've done some reading and I saw that it's not possible to accelerate a search that has a non-streaming command followed by a streaming command which in my case is the Transaction command followed by the Stats command.

Can you please clarify what you meant in the second part of you sentence? Did you mean where did I put the token from my form input into my search? If so then it's that search RTG_IPmain=*right after the Transaction command.

How To Accelerate a Search?

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability

Are you a member of the Splunk Community?

How To Accelerate a Search?

Observe and Secure All Apps with Splunk

Splunk Decoded: Business Transactions vs Business IQ

Fastest way to demo Observability