when I search with below query
search result will appear within one second amazing fast 🙂
this log information is older then one month
but when I search with this query
sourcetype=my_log | transaction startswith=log_begin endswith=log_end | where UUID="3fc5e6c2-57b4-4e59-a3c0-8115f5ec74a1"
It'll take 8 to 10 minutes to display the result 😞 extremely slow
Now I have two question
The first query is fast because splunk can use index data to narrow down the events that need to be loaded.
The second query is slow because splunk has to push everything into the transaction command, which then is slow because it can't handle large (in splunk terms) amounts of data.
One way to speed things up is to narrow down the time range that needs to be searched.
Other ways depend on your data and what you do with it.
UUID field exist in all events you are interested in? Like martinmueller said the first search is fast because index data is used to narrow down your search results. But the second search is very slow because it is handling so much data. If i understand the search pipeline correctly, your second search is taking the entire contents of `mylog
and trying to apply thetransaction
function to it before narrowing it down again with thewhere
is an intensive operation and you'll want to narrow down your search results as much as possible before piping to it. Additionally, if there is a field that uniquely identifies log entries as part of a transaction, you should include them as the optional field list of thetransaction
command, this makes it easier fortransaction` to group events together. Would a search like one of the following accomplish what you need?
sourcetype=my_log UUID="3fc5e6c2-57b4-4e59-a3c0-8115f5ec74a1" | transaction UUID startswith=log_begin endswith=log_end
NO UUID appears only once in a transaction, I understand the reason but 8 minutes is not good for search the log. Is there any other alternate e.g. to display x line before UUID field and y line after UUID field.
8 minutes is understandable since you're telling Splunk to retrieve all events from disk before really doing anything.
You might want to look into the localize command: http://docs.splunk.com/Documentation/Splunk/5.0.1/SearchReference/Localize
So I realize I'm way late to the party here, but what about using a subsearch? Assuming that there is a field in your log data (let's call it myTransactionID) can be used to uniquely identify a transaction, you could do something like:
sourcetype=mylog [search sourcetype=my\log UUID="3fc5e6c2-57b4-4e59-a3c0-8115f5ec74a1" | dedup myTransactionID | fields myTransactionID] | transaction startswith=log_begin endswith=log_end
Essentially, what the subsearch does is find the initial log with the specified UUID value, obtain the value of myTransactionID, and then pass that as an argument to the main search so that it only returns events with the matching transaction ID. Normally subsearches aren't particularly fast, so as a general rule I wouldn't be suggesting them for optimization, but it will be far better than letting transaction operate on every single event with the my_log sourcetype.