Splunk Search

How to get only latest record for each specific column?

shivaguthi
Explorer

sample data

_time source name appId state
10/8/20
7:53:27.090 AM
xyz Transform-x-2020-10-08 1001 success
10/8/20
7:53:16.890 AM
xyz Transform-x-2020-10-08 1001 running
10/8/20
7:53:06.490 AM
xyz Transform-x-2020-10-08 1001 started
10/8/20
7:53:27.090 AM
xyz copy-y-2020-10-08 203 success
10/8/20
7:53:16.890 AM
xyz copy-y-2020-10-08 203 running
10/8/20
7:53:06.490 AM
xyz copy-y-2020-10-08 203 started

 

there are 3 rows with same name(Transform-x-),appId (1001) and other 3 rows with same name(copy-y-) appId(203) , need to fetch the latest for each appId

expected output:

_time source name appId state
10/8/20
7:53:27.090 AM
xyz Transform-x-2020-10-08 1001 success
10/8/20
7:53:27.090 AM
xyz copy-y-2020-10-08 203 success



can someone please help with as im new to splunk

Labels (3)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @shivaguthi,

in stats command it's easy to identify the latest _time using the function "latest", but the problem is that I don't know how to find the relative state.

To do this I created a temp field merging _time and state, to be sure to have the latest,

| eval my_state=_time."|".state

then I use this new field in stats taking the max (_time in the first part assures this condition):

| stats latest(_time) AS _time values(source) AS source values(name) AS name max(my_state) AS my_state BY appId

then I extract only the second part of this temp file containing state using a regex:

| rex field=my_state "[^\|]+(?<state>.*)"

So I have all the values I want.

Ciao.

Giuseppe

P.S.: if this answer solves your need, please, accept it for the other people of Community and Karma Points are appreciated 😉

View solution in original post

Tartif
Loves-to-Learn

Hello,
Maybe a little late for the main post, but for the others who launch here.
The best solution is to use the timestamp for sorting :

 

# only if your _time is not native and format is not timestamp unix or in ISO date (YYYY-mm-dd HH:MM:SS)
|eval time=strptime(_time,"my_format_date") 

 

and dedup the event with the column to be unique. For the exemple :

 

|dedup appId sortby -_time

 

You will have the latest event/row for the appId
If you want to deduplicate with the "name" in addition, do this :

 

|dedup appId name sortby -_time

 


For more documentation :
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Dedup


0 Karma

shivaguthi
Explorer

@gcusello Thanks for the reply.
I'm getting one record per appId but I'm not getting the record with latest timestamp

I'm trying to fetch one record per appId which has the latest timestamp
please suggest on this

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shivaguthi,

sorry but I don't understand: with my search you take the latest _time and the last state for each appId, isn't correct?

What's the results your have with my search?

Ciao.

Giuseppe

0 Karma

shivaguthi
Explorer

Hi @gcusello 

shivaguthi_0-1602147708424.png

take a look at the yellow highlighted row it's  state is "starting", i see the latest timestamp. Actually state should be "success" in the above result
but I wonder if I look at that event it's  state is "success" as shown below.

shivaguthi_1-1602147932241.png

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

hi @shivaguthi,

only one question:

in the screenshot is appId application_xxxxxxxx_000x?

could you share a sample with the same appId?

Ciao.

Giuseppe

0 Karma

shivaguthi
Explorer

hi @gcusello 

below is the sample data,  same appId but

different time stamp, state(success, running,running)

below reocrd has latest time stamp which has state as success - this is expected to come in search

what actually coming already in above

10/8/20
9:22:57.860 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=success, status=null, name=Copy-x-2020-10-08,
appId = application_1602141094791_0003log = [state = success


10/8/20
9:22:47.560 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=running, status=null, name=Copy-x-2020-10-08,
appId = application_1602141094791_0003log = [state = running


10/8/20
9:22:37.220 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=running, status=null, name=Copy-x-2020-10-08, appId = application_1602141094791_0003log = [state = running

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shivaguthi,

let me understand: the problem is that the order of the events isn't a time order, but, (in you example), this event 

10/8/20
9:22:37.220 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=running, status=null, name=Copy-x-2020-10-08, appId = application_1602141094791_0003log = [state = running

could be the last in order of arriving but not the latest in time,

Is this correct?

If this is you problem, you could run something like this:

your_search
| eval my_state=_time."|".state
| stats latest(_time) AS _time values(source) AS source values(name) AS name max(my_state) AS my_state BY appId
| eval _time=strftime(_time,"%m/%d/%y %H:%M:%S.%3N")
| rex field=my_state "[^\|]+(?<state>.*)"
| table _time source name appId state

Ciao.

Giuseppe

0 Karma

shivaguthi
Explorer

Hi @gcusello 

This looks good.
could you pls explain this search query

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shivaguthi,

in stats command it's easy to identify the latest _time using the function "latest", but the problem is that I don't know how to find the relative state.

To do this I created a temp field merging _time and state, to be sure to have the latest,

| eval my_state=_time."|".state

then I use this new field in stats taking the max (_time in the first part assures this condition):

| stats latest(_time) AS _time values(source) AS source values(name) AS name max(my_state) AS my_state BY appId

then I extract only the second part of this temp file containing state using a regex:

| rex field=my_state "[^\|]+(?<state>.*)"

So I have all the values I want.

Ciao.

Giuseppe

P.S.: if this answer solves your need, please, accept it for the other people of Community and Karma Points are appreciated 😉

gcusello
SplunkTrust
SplunkTrust

Hi @shivaguthi,

try something like this:

 

your_search
| stats latest(_time) AS _time values(source) AS source values(name) AS name last(state) AS state BY appId
| eval _time=strftime(_time,"%m/%d/%y %H:%M:%S.%3N")
| table _time source name appId state

 

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Fall Into Learning with New Splunk Education Courses

Every month, Splunk Education releases new courses to help you branch out, strengthen your data science roots, ...

Super Optimize your Splunk Stats Searches: Unlocking the Power of tstats, TERM, and ...

By Martin Hettervik, Senior Consultant and Team Leader at Accelerate at Iver, Splunk MVPThe stats command is ...

How Splunk Observability Cloud Prevented a Major Payment Crisis in Minutes

Your bank's payment processing system is humming along during a busy afternoon, handling millions in hourly ...