I have search inside a dashboard which shows a table of
IP Address | JSession Count | Browser | Web Request.. I have well over a million rows within my table and I included a search box which the user will search for an IP and it will return the other data I listed above from the table. Now I need to accelerate this process so the historical data is cached and can be loaded up quickly.
I tried turning my search into a report and accelerating it but got the " Your report doesn't qualify for acceleration" message.. I removed the timerange picker and tried again and got the same message. The only thing I can think of is, that it's due to not having a transform command, but I am confused as I have a STATS command in my search. If all else fails can I accelerate the data model?
Here's my search
index=access OR index=main | transaction RTG_JSession | search RTG_IPmain=* | table RTG_IPmain RTG_WebRequest RTG_Browser | stats count values(RTG_Browser) values(RTG_WebRequest) BY RTG_IPmain | sort -count | rename RTG_IPmain AS "IP Address" | rename count AS "JSession Count" | rename "values(RTG_Browser)" AS "Browser" | rename "values(RTG_WebRequest)" AS "Web Request"
transaction can be a pain to make fast over large sets of data... where in that search are you putting the filters into that you get from the form?
Thanks for the input martin_mueller. I've done some reading and I saw that it's not possible to accelerate a search that has a non-streaming command followed by a streaming command which in my case is the Transaction command followed by the Stats command.
Can you please clarify what you meant in the second part of you sentence? Did you mean where did I put the token from my form input into my search? If so then it's that
search RTG_IPmain=*right after the Transaction command.
A search that contains a transaction command cannot be accelerated. Perhaps something like the following would work...
index=access OR index=main | stats values(RTG_IPmain) values(RTG_WebRequest) as RTG_WebRequest values(RTG_Browser) as RTG_Browser by RTG_JSession RTG_IPmain RTJ | sort -count | rename RTG_IPmain AS "IP Address" | rename count AS "JSession Count" | rename "values(RTG_Browser)" AS "Browser" | rename "values(RTG_WebRequest)" AS "Web Request"
I probably over-simplified somewhere, but this search can be accelerated. If this isn't enough to get you started, could you show us a (scrubbed) sample of your data? Also, if you could describe, in general, what you are trying to do, that will help the community figure out alternate solutions.
Hello Iguinn, thanks for the reply.
Let me explain what I'm trying to do and maybe there's a better way to go about it. I'm working on a click fraud project for the marketing group. We need to identify what web requests and browser come from a true clients IP and if they are committing click fraud.
I have 2 indexes (main and access). The main index has the true client IP while the access index has the IP located in the Browser header which may not be the true client IP. The access index has the web request info, Browser and the JSession ID. So I had to create a table which would show the true clients IP address from index=main and the other info which came from index=access. So since both indexes has one thing in common (JSession ID) I tied them together with a transaction command and showed results from both indexes.
So now since the marketing people who will need to detect click fraud have zero Splunk knowledge, I made a dashboard where they can enter the IP Address or Web Request into a search box and the table will filter all results which include whatever the user entered.
This sounds like the pattern "subsearch in one sourcetype for IP, return session ID to search in all sourcetypes, build transaction" to me.
Yes very close to what I'm doing. First it will search both indexes and group events that have the same JSession ID. It will then return the events. The events returned will have the true client IP from index=main with the Web Request and Browser from index=access within 1 event. So basically combining events from multiple indexes. The Jsession ID is only used to match events together which may not have the same IP Address
If you first apply a transaction and then filter it'll be a million times slower than using the pattern I described.
(index=access OR index=main) [search index=access IP="whatever the user types into the form" | dedup RTG_JSession | fields RTG_JSession] | transaction RTG_JSession | ...
I like your thinking on this and I agree. The transaction command is killing my performance, but it looks like you have a subsearch then pipe that output into the transaction command. Wouldn't a subsearch kill the performance just like having the Transaction command at the beginning?
Hey @martin_mueller - can you post an answer that contains your pattern (or convert the comment to an answer) - I'd love to make it more visible and vote it up.
@skoelpin - I think Martin's solution will be much faster, even though it has a subsearch and a transaction! First, the subsearch is actually quite small: it only returns the JSession ID associated with the user's choice of IP address. It will run quickly and then constrain the base search. So it filters before the transaction, so Splunk only has to make transactions from a limited set of data.
You won't be able to accelerate this search, but you may not need to.