Splunk Search

Why is the join command joining more than it should in my search results?

_dave_b
Communicator

Hi. I have this data:

Row cTime        pTime       uName     connectionId
 1  23:10:54     22:34:08    user1   user1Connection1
 2  22:34:58     21:02:53    user2     user2Connection2
 3  21:02:53     20:34:34    user1     user1Connection1
 4  20:34:34     19:23:34    user2     user2Connection2

The cTime column represents a time of connection, with the previous connection time in the pTime column, the name of the user who made the connection in the uName column, and the Id of the connection in the connectionId column.

In order to get some parts of the table populated, I have to do a subsearch, because not all the data I need is in one entry, but can be found in another entry by means of linking them together with connectionId

I do a search like

index=* "...."  | join connectionId [ search "connection success" | reverse | streamstats current=f last(connectionTime) as pTime | reverse | fields pTime, connectionId ]

The problem is, the pTime cells in row 1 are being populated by the pTime cells in row two, when there should not be any data from row 2 showing up at all by virtue of the join connectionId command. It should only be joining data with a connectionId value of "user1Connection1"... but this is not happening.

Am I doing the join wrong? I can't find anything in the docs that says I am. Is there a better way to link the two entries together?

Thanks for your help

Tags (2)
0 Karma
1 Solution

lguinn2
Legend
 ... [ search "connection success" 
       | dedup 2 connectionId
       | eval pTimeEpoch=strptime(pTime,"%H:%M:%S") 
       | stats max(pTimeEpoch) as latest_pTime min(pTimeEpoch) as prior_pTime min by connectionId
       | eval latest_pTime=tostring(latest_pTime,"duration")  | eval prior_pTime=tostring(prior_pTime,"duration")
       | fields connectionId latest_pTime prior_pTime ]

This version of the subsearch uses dedup to select the 2 most recent successful connections, for each connectionId. Then it summarizes the results so that latest_pTime represents the last (most recent) successful connection and prior_pTime is the next most recent connection.

View solution in original post

lguinn2
Legend
 ... [ search "connection success" 
       | dedup 2 connectionId
       | eval pTimeEpoch=strptime(pTime,"%H:%M:%S") 
       | stats max(pTimeEpoch) as latest_pTime min(pTimeEpoch) as prior_pTime min by connectionId
       | eval latest_pTime=tostring(latest_pTime,"duration")  | eval prior_pTime=tostring(prior_pTime,"duration")
       | fields connectionId latest_pTime prior_pTime ]

This version of the subsearch uses dedup to select the 2 most recent successful connections, for each connectionId. Then it summarizes the results so that latest_pTime represents the last (most recent) successful connection and prior_pTime is the next most recent connection.

_dave_b
Communicator

This is picking up some "connection success" events, but not returning any results. I had to change the line
eval pTimeEpoch=strptime(pTime,"%H:%M:%S")

to eval pTimeEpoch = _time, since strptime returns a string and the stats functions expect an Epoch time.

Here is the line I used:
"Key Exchange" | dedup 2 connectionId | eval pTimeEpoch=_time | stats max(pTimeEpoch) as latest_pTime min(pTimeEpoch) as prior_pTime by connectionId | eval latest_pTime=tostring(latest_pTime,"duration") | eval prior_pTime=tostring(prior_pTime,"duration") | fields connectionId latest_pTime prior_pTime | table _time, connectionId, latest_pTime, prior_pTime

0 Karma

lguinn2
Legend

strptime returns a string! It should return an epoch time... but you can use _time of course. BTW, _time will not exist in your final table command, as it is removed in the stats command earlier in the pipeline.

What do you mean by "not returning any results"? Are there events that match "key exchange" but that don't appear in the final table when you run this as a stand-alone search?

0 Karma

_dave_b
Communicator

By "not returning any results", I meant that there were no results for it to report on in the statistics tab when I tried to run a table on it, but now I realize that those events it's caught are the ones that I'm interested in. Additionally, I think I muffed a few field names in translation when I typed in the search. So, you got it! I just double checked the original logs to verify and everything checks out.

I'm going to study how those commands work. Also, I did not know the stats command would remove _time, so thanks for informing me about that as well!

0 Karma

lguinn2
Legend

What is returned from the base search (main search) if you run it without the subsearch?

0 Karma

lguinn2
Legend

Depending on what you want, you could simplify the subsearch quite a lot. If you want the most recent successful connection, you could use

... [ search "connection success" 
      | eval pTimeEpoch=strptime(pTime,"%H:%M:%S") 
      | stats max(pTimeEpoch) by connectionId
      | eval pTime_joined=tostring(pTimeEpoch,"duration")
      | fields pTime_joined connectionId ]

Replace "max" with "min" to get the oldest successful pTime. I renamed the pTime field in the subsearch, otherwise the field from the subsearch will overwrite any field with the same name in the main search. There is little way for me to understand what is happening with the join, because you don't show the main search at all. It would be helpful to see (1) results from the main search without join
(2) results from running the subsearch as a standalone search and (3) an explanation of the final results that you want.

0 Karma

_dave_b
Communicator

(3 - final result) Not only do I want the most recent connection, but the time of connection renewal just prior to that that, i.e., if I were to make a numbered list of 10 connection sequences, i would want list items #1 and #2 (or #9 and #10, depending on your perspective). I thought I had found this with streamstats, but things seems to get mixed up when I add in another requirement (correspondance to a unique connection ID) to the mix.

I wil research Stats to see if there's a command that will narrow down the results to the latest and next-to-latest. ( I am still learning Splunk)

(1 - Results from the Main Search without join):
"" <-- search for stuff
|
join connectionID

(2 - Results from Running subsearch as standalone search)
[ search "connection success"
| streamstats current=f last(_time) as previousRenewalTime
| fields previousRenewalTime,renewalTime,connectionID ]

0 Karma

lguinn2
Legend

Is it connectionId or connectionID - that makes a difference!

0 Karma

aholzer
Motivator

you may want to look into the transaction command.

0 Karma

lguinn2
Legend

This seems like a complicated search string, and I can't figure out what you are trying to do. What results do you want?

0 Karma
Get Updates on the Splunk Community!

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...