sample data
_time | source | name | appId | state |
10/8/20 7:53:27.090 AM |
xyz | Transform-x-2020-10-08 | 1001 | success |
10/8/20 7:53:16.890 AM |
xyz | Transform-x-2020-10-08 | 1001 | running |
10/8/20 7:53:06.490 AM |
xyz | Transform-x-2020-10-08 | 1001 | started |
10/8/20 7:53:27.090 AM |
xyz | copy-y-2020-10-08 | 203 | success |
10/8/20 7:53:16.890 AM |
xyz | copy-y-2020-10-08 | 203 | running |
10/8/20 7:53:06.490 AM |
xyz | copy-y-2020-10-08 | 203 | started |
there are 3 rows with same name(Transform-x-),appId (1001) and other 3 rows with same name(copy-y-) appId(203) , need to fetch the latest for each appId
expected output:
_time | source | name | appId | state |
10/8/20 7:53:27.090 AM |
xyz | Transform-x-2020-10-08 | 1001 | success |
10/8/20 7:53:27.090 AM |
xyz | copy-y-2020-10-08 | 203 | success |
can someone please help with as im new to splunk
Hi @shivaguthi,
in stats command it's easy to identify the latest _time using the function "latest", but the problem is that I don't know how to find the relative state.
To do this I created a temp field merging _time and state, to be sure to have the latest,
| eval my_state=_time."|".state
then I use this new field in stats taking the max (_time in the first part assures this condition):
| stats latest(_time) AS _time values(source) AS source values(name) AS name max(my_state) AS my_state BY appId
then I extract only the second part of this temp file containing state using a regex:
| rex field=my_state "[^\|]+(?<state>.*)"
So I have all the values I want.
Ciao.
Giuseppe
P.S.: if this answer solves your need, please, accept it for the other people of Community and Karma Points are appreciated 😉
Hello,
Maybe a little late for the main post, but for the others who launch here.
The best solution is to use the timestamp for sorting :
# only if your _time is not native and format is not timestamp unix or in ISO date (YYYY-mm-dd HH:MM:SS)
|eval time=strptime(_time,"my_format_date")
and dedup the event with the column to be unique. For the exemple :
|dedup appId sortby -_time
You will have the latest event/row for the appId
If you want to deduplicate with the "name" in addition, do this :
|dedup appId name sortby -_time
For more documentation :
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Dedup
@gcusello Thanks for the reply.
I'm getting one record per appId but I'm not getting the record with latest timestamp
I'm trying to fetch one record per appId which has the latest timestamp
please suggest on this
Hi @shivaguthi,
sorry but I don't understand: with my search you take the latest _time and the last state for each appId, isn't correct?
What's the results your have with my search?
Ciao.
Giuseppe
Hi @gcusello
take a look at the yellow highlighted row it's state is "starting", i see the latest timestamp. Actually state should be "success" in the above result
but I wonder if I look at that event it's state is "success" as shown below.
hi @shivaguthi,
only one question:
in the screenshot is appId application_xxxxxxxx_000x?
could you share a sample with the same appId?
Ciao.
Giuseppe
hi @gcusello
below is the sample data, same appId but
different time stamp, state(success, running,running)
below reocrd has latest time stamp which has state as success - this is expected to come in search
what actually coming already in above
10/8/20
9:22:57.860 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=success, status=null, name=Copy-x-2020-10-08,
appId = application_1602141094791_0003log = [state = success
10/8/20
9:22:47.560 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=running, status=null, name=Copy-x-2020-10-08,
appId = application_1602141094791_0003log = [state = running
10/8/20
9:22:37.220 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=running, status=null, name=Copy-x-2020-10-08, appId = application_1602141094791_0003log = [state = running
Hi @shivaguthi,
let me understand: the problem is that the order of the events isn't a time order, but, (in you example), this event
10/8/20
9:22:37.220 AM
INFO: got response from the livy LivyResponse [id=1, appId=application_1602141094791_0003, state=running, status=null, name=Copy-x-2020-10-08, appId = application_1602141094791_0003log = [state = running
could be the last in order of arriving but not the latest in time,
Is this correct?
If this is you problem, you could run something like this:
your_search
| eval my_state=_time."|".state
| stats latest(_time) AS _time values(source) AS source values(name) AS name max(my_state) AS my_state BY appId
| eval _time=strftime(_time,"%m/%d/%y %H:%M:%S.%3N")
| rex field=my_state "[^\|]+(?<state>.*)"
| table _time source name appId state
Ciao.
Giuseppe
Hi @gcusello
This looks good.
could you pls explain this search query
Hi @shivaguthi,
in stats command it's easy to identify the latest _time using the function "latest", but the problem is that I don't know how to find the relative state.
To do this I created a temp field merging _time and state, to be sure to have the latest,
| eval my_state=_time."|".state
then I use this new field in stats taking the max (_time in the first part assures this condition):
| stats latest(_time) AS _time values(source) AS source values(name) AS name max(my_state) AS my_state BY appId
then I extract only the second part of this temp file containing state using a regex:
| rex field=my_state "[^\|]+(?<state>.*)"
So I have all the values I want.
Ciao.
Giuseppe
P.S.: if this answer solves your need, please, accept it for the other people of Community and Karma Points are appreciated 😉
Hi @shivaguthi,
try something like this:
your_search
| stats latest(_time) AS _time values(source) AS source values(name) AS name last(state) AS state BY appId
| eval _time=strftime(_time,"%m/%d/%y %H:%M:%S.%3N")
| table _time source name appId state
Ciao.
Giuseppe