Splunk Search

Combine events based on shared session field value efficently in verbose log source

JSkier
Communicator

I have an index with an excessive amount of logs from an application. The application divides these by event types contained in one index. The event type I'm interested in reporting on does not have a high volume of events, so I've started with that.

Typically I'll have 15 events in an hour for my event type I'm most interested in. I'd like to take the session value (common across every event type) from that search and use it to search another event type. I've tried transaction, which works well but it's incredibly slow because it's not distinguishing that I only care about retrieving those 15 or so event session values from the first search.

An example is this: A user logins in and this is noted in the initialize_event log type and has things like username, src_ip, useragent, unique session value. 5 minutes later, they buy shoes (shoe_event). The shoe_event logs have data about the type of shoe, shares the session value, but doesn't have info about the username, src_ip, or useragent. I'd like to have the info about the shoe type merged with the initial log in information based on the matching session value. This way, it appears as if they were one event so I can do reporting across event types. Ideally splunk would take the results of the initial search and use that to only look for those session values (I plan on also excluding the shoe_event type from the secondary search as well).

Tags (1)
0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Start with this...

index=foo (eventtype=raretype OR eventType=commontype)
    [index=foo eventType=raretype 
    | ....  whatever else you need to limit it down ...
    | table session]
| fields _time index eventtype session ... all the other fields you want

That subsearch in brackets results in something that looks like this..

( ( session="value1" ) OR ( session="value2" ) OR ( session="value3" ) OR ... )

And you can see exactly what it results in by running it standalone, without the brackets, and with a | format command on the end...

index=foo eventType=raretype 
| ....  whatever else you need to limit it down ...
| table session
| format 

Once you have the above working the way you like it, and pulling the subset of records that you want, then you can use stats or transaction or a number of other techniques to roll together the sales.

I would suggest this one:

| rename COMMENT as "copy the user data from the rare event to the comon one and then drop the rare one" 
| eventstats values(username) as username, values(src_ip) as src_ip, values(useragent) as useragent by session  
| where eventType!="raretype"
| rename COMMENT as "now you have only your common events, but they all have that information that you wanted."

JSkier
Communicator

This isn't exactly what I was going for, but it did encourage me to make the searches a little more efficient, thank you.The problem with this is that it starts by pulling in all of the common events for the initial search, which makes it really slow. I want to take the session value results of the rare events and use that to parse the common events to minimize the amount of index parsing needed (thus speeding things up).

The format command is helpful, thank you for mentioning that also.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

@jskier - no, it doesn't. The subsearch runs first, producing a set of values for session that are then passed to the overall query. That subsearch output will then limit the results of the bloom filters. Only records with those values are pulled, so if they are as unique as session IDs normally are, then nothing should be being parsed except very close to the records you want.

You could go one step further and alter the formatted output, but in this case I don't think it will make it any more efficient...

[index=foo eventType=raretype 
 | ....  whatever else you need to limit it down ...
 | format "(" "" "" "" "OR" ")"
 | rex mode=sed field=search "s/session=//g" 
 | table search]

You could also take the output and the time for the event and feed it into MAP...

Or, using the above technique you could calculate from the event some time range, and set it up that way. Let's assume +/- 2 seconds from the time of the session.

[| makeresults 
| eval session="value1 value2 value3" | makemv session | mvexpand session 
| streamstats count as recno | eval _time = _time -3600 + 5* recno
| rename COMMENT as "the above simulates your search that brings in session and _time"

| rename COMMENT as "the below reformats it into ( _time>= x-2 AND _time<= x+2 AND session=YYYY )"
| eval earliest=_time - 2, latest=_time + 2
| table earliest latest session
| format
| rex field=search mode=sed "s/earliest=/_time>=/g s/latest=/_time<=/g"
]

That should be about as efficient as you can get with this underlying data.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...