Reporting

How To Accelerate a Search?

skoelpin
SplunkTrust
SplunkTrust

I have search inside a dashboard which shows a table of IP Address | JSession Count | Browser | Web Request.. I have well over a million rows within my table and I included a search box which the user will search for an IP and it will return the other data I listed above from the table. Now I need to accelerate this process so the historical data is cached and can be loaded up quickly.

I tried turning my search into a report and accelerating it but got the " Your report doesn't qualify for acceleration" message.. I removed the timerange picker and tried again and got the same message. The only thing I can think of is, that it's due to not having a transform command, but I am confused as I have a STATS command in my search. If all else fails can I accelerate the data model?

Here's my search

index=access OR index=main | transaction RTG_JSession | search RTG_IPmain=* | table RTG_IPmain RTG_WebRequest RTG_Browser | stats count values(RTG_Browser) values(RTG_WebRequest) BY RTG_IPmain | sort -count | rename RTG_IPmain AS "IP Address" | rename  count AS "JSession Count" | rename "values(RTG_Browser)" AS "Browser" | rename "values(RTG_WebRequest)" AS "Web Request"
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Two comments combined for @lguinn 😉

This sounds like the pattern "subsearch in one sourcetype for IP, return session ID to search in all sourcetypes, build transaction" to me:

(index=access OR index=main) [search index=access IP="whatever the user types into the form" | dedup RTG_JSession | fields RTG_JSession] | transaction RTG_JSession | ...

The subsearch pulls all session IDs matching the IP you're looking for, then you use that to search all events for those sessions... giving you much much smaller result sets. Then you apply the transaction on small data, running quickly.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Two comments combined for @lguinn 😉

This sounds like the pattern "subsearch in one sourcetype for IP, return session ID to search in all sourcetypes, build transaction" to me:

(index=access OR index=main) [search index=access IP="whatever the user types into the form" | dedup RTG_JSession | fields RTG_JSession] | transaction RTG_JSession | ...

The subsearch pulls all session IDs matching the IP you're looking for, then you use that to search all events for those sessions... giving you much much smaller result sets. Then you apply the transaction on small data, running quickly.

martin_mueller
SplunkTrust
SplunkTrust

You're again filtering after the transaction!!

  (index=access OR index=main) [search index=access $ip$ | dedup RTG_JSession | fields RTG_JSession]
| transaction RTG_JSession | stats ...
0 Karma

skoelpin
SplunkTrust
SplunkTrust

I see what you're saying, it would be much faster to take a chunk from the large pool then do a transaction which is much slower across that smaller pool to make the search faster.

*The token I used in my form which is also in my search is the $ip$ and your search example says to include this in the index=access subsearch.. When I defined the token in my form input, it was taken from the RTG_IPmain which is from the main index.

Can you help with making my search work correctly? This is what I have so far ( I stripped off the sort and renames to make it easier to read)

(index=access OR index=main) [search index=access | stats count values(RTG_Browser) values(RTG_WebRequest)]  | transaction RTG_JSession | search $ip$ | stats count values(RTG_Browser) values(RTG_WebRequest) BY RTG_IPmain | table RTG_IPmain RTG_WebRequest RTG_Browser
0 Karma

lguinn2
Legend

A search that contains a transaction command cannot be accelerated. Perhaps something like the following would work...

index=access OR index=main 
| stats values(RTG_IPmain) values(RTG_WebRequest) as RTG_WebRequest values(RTG_Browser) as RTG_Browser by RTG_JSession RTG_IPmain RTJ
| sort -count 
| rename RTG_IPmain AS "IP Address" 
| rename  count AS "JSession Count" 
| rename "values(RTG_Browser)" AS "Browser" 
| rename "values(RTG_WebRequest)" AS "Web Request"

I probably over-simplified somewhere, but this search can be accelerated. If this isn't enough to get you started, could you show us a (scrubbed) sample of your data? Also, if you could describe, in general, what you are trying to do, that will help the community figure out alternate solutions.

martin_mueller
SplunkTrust
SplunkTrust

This sounds like the pattern "subsearch in one sourcetype for IP, return session ID to search in all sourcetypes, build transaction" to me.

skoelpin
SplunkTrust
SplunkTrust

Yes very close to what I'm doing. First it will search both indexes and group events that have the same JSession ID. It will then return the events. The events returned will have the true client IP from index=main with the Web Request and Browser from index=access within 1 event. So basically combining events from multiple indexes. The Jsession ID is only used to match events together which may not have the same IP Address

0 Karma

lguinn2
Legend

Hey @martin_mueller - can you post an answer that contains your pattern (or convert the comment to an answer) - I'd love to make it more visible and vote it up.

@skoelpin - I think Martin's solution will be much faster, even though it has a subsearch and a transaction! First, the subsearch is actually quite small: it only returns the JSession ID associated with the user's choice of IP address. It will run quickly and then constrain the base search. So it filters before the transaction, so Splunk only has to make transactions from a limited set of data.

You won't be able to accelerate this search, but you may not need to.

martin_mueller
SplunkTrust
SplunkTrust

Done......

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

If you first apply a transaction and then filter it'll be a million times slower than using the pattern I described.

(index=access OR index=main) [search index=access IP="whatever the user types into the form" | dedup RTG_JSession | fields RTG_JSession] | transaction RTG_JSession | ...

skoelpin
SplunkTrust
SplunkTrust

I like your thinking on this and I agree. The transaction command is killing my performance, but it looks like you have a subsearch then pipe that output into the transaction command. Wouldn't a subsearch kill the performance just like having the Transaction command at the beginning?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Hello Iguinn, thanks for the reply.

Let me explain what I'm trying to do and maybe there's a better way to go about it. I'm working on a click fraud project for the marketing group. We need to identify what web requests and browser come from a true clients IP and if they are committing click fraud.

I have 2 indexes (main and access). The main index has the true client IP while the access index has the IP located in the Browser header which may not be the true client IP. The access index has the web request info, Browser and the JSession ID. So I had to create a table which would show the true clients IP address from index=main and the other info which came from index=access. So since both indexes has one thing in common (JSession ID) I tied them together with a transaction command and showed results from both indexes.

So now since the marketing people who will need to detect click fraud have zero Splunk knowledge, I made a dashboard where they can enter the IP Address or Web Request into a search box and the table will filter all results which include whatever the user entered.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

transaction can be a pain to make fast over large sets of data... where in that search are you putting the filters into that you get from the form?

skoelpin
SplunkTrust
SplunkTrust

Thanks for the input martin_mueller. I've done some reading and I saw that it's not possible to accelerate a search that has a non-streaming command followed by a streaming command which in my case is the Transaction command followed by the Stats command.

Can you please clarify what you meant in the second part of you sentence? Did you mean where did I put the token from my form input into my search? If so then it's that search RTG_IPmain=*right after the Transaction command.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...