Splunk Search

Automatically group events into transactions, or: reassemble lines (before indexing) based on shared field

fw42
New Member

Hey folks,

I have a web application that logs several log lines per request. Each line is tagged with the request id of that request, so lines that were logged during the same request are tagged with the same request id.

Example:

request_id=1 things are happening
request_id=1 bla bla
request_id=2 things are happening over here too
request_id=1 request is done
request_id=2 request is done

Is it possible to "aggregate" those lines into two events (one for request_id=1 and one for request_id=2)?

Please note that I want to do the aggregation before indexing, not at search time (I don't want my staff to use things like the "transaction" command for every single query they make, just to see all lines of a request). I basically want this "transaction" behaviour by default.

In my example, the logs are interweaved. If it makes my Splunk problems simpler, I can work around that by buffering all lines before logging them, but I would still like to aggregate them based on request_id (not based on ugly regular expressions for splitting things - I would like to make things as context free as possible and not depend on knowledge about the actual content (like known keywords or other similar hacks)).

Thanks in advance

Flo

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

At index time you're not going to be too successful. However, you can hide the transaction from your search end users.

One way would be to index your data as-is into an index that's invisible to them. You set up a summary search that calculates the transaction every now and then and writes the whole shebang into a summary index that's visible to the search end users. To them, it looks as if the events never were split up.
In order to not miss things you will need to know some boundaries for your requests, most importantly how long a request can take between its first and last event to get the summary search timing right.

In principle you could build a datamodel on top of transaction-style objects, but that's more statistics-oriented than debugging... would hide lots of stuff from end users nicely though.

As for search-time transactions, these can indeed be slow in some situations. In other situations there's lots of potential for optimization though - that'd be for your summary search, can't expect most end users to do that.

0 Karma

aweitzman
Motivator

Most statistical searches ending in by request_id will do what you want, without having to do anything before indexing.

Also, you should consider writing macros so that you can help your staff write simplified queries, and creating some lines in props.conf that will extract fields you care about so you don't have to use regexes in the search language to get them.

You can also just add | sort request_id to get your events lined up. (You can make this part of a macro if you want.)

0 Karma

fw42
New Member

Thanks for the quick reply.

We use Splunk mostly for debugging, not for generating statistics, and we found that we usually search for certain keywords and almost always want the "full context" (all lines that were logged by that event, not just one). Also we found the "transaction" command (to group events together after they were already indexed) to be very slow (compared to sending multi-line events to the indexer).

So unfortunately, I don't think that setting up a macro to do this is the right solution for us, both from a usability and a performance perspective.

0 Karma

aweitzman
Motivator

So then just sorting your results using | sort request_id ought to do what you want, then.

You definitely will get no benefit from Splunk by trying to arrange things prior to indexing. That way lies madness. The Splunk engine is plenty fast in rearranging your results using the sort command.

0 Karma

fw42
New Member

I don't see how sorting helps here or has anything to do with it.

I want, by default, all lines that were logged during a request, to show up as one event in Splunk. I almost never want to search for only a single line, we always need search results to include the full context of a request. Also I want this behaviour by default (not adding additional sort or transaction commands or anything alike).

Right now we solve this by using regular expressions to determine where an event starts and ends, but that solution is incredibly hacky on so many levels and I feel like there must be a cleaner, more "context free", solution.

0 Karma

aweitzman
Motivator

Sorting will put all of the events related to a single request adjacent to each other. It's not "one event" necessarily, but you'll see all the events in a similar way.

If that really bothers you for some reason, and you really want everything to appear as one "event", then one possible alternative to transaction is, if you have parsed the remainder of the line into its own field (call it event_text), you can do something like ... | stats list(event_text) as all_event_texts by request_id. That will give you a table where one column is the request_id and the other is all of the event_text values, ordered by time.

If you don't want people to bother with the search language, then write the searches for them and save them as saved searches. Then they just have to call up that saved search, and not bother with the language at all. You can event create a dashboard that points to these saved searches and just give your staff access to that dashboard.

One thing you might also look into is restricting searches for your staff. Each "role" you create in Splunk can have its own "Restrict search terms" prefix, and every search from users with those roles will be prefixed by that search term. So that's one way of forcing those users into certain "default" search choices. Look under any role in Settings > Access controls > Roles and you can modify this string for that role.

0 Karma
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...