Splunk Search

How to omit consecutive same results and remove it?

vrmandadi
Builder

Hello Splunkers,

 

I have a query where I did a  |stats values(abc) as abc command over time .I got the below results . I want the abc column to omit the consecutive same results .I basically want to compare the results with the above one and show only which are different and count of it is different.

You can see below 2022-04-07 12:41:17  and 2022-04-04 10:16:34 have the same "abc " value .I want to omit the second one and then compare with the 3rd result which is different so keep that and then compare the 3rd value with 4th and continue that way..Also show the one which have less count like 2022-03-07 11:48:46 .I tried dedup but it did not work..Any suggestions?

 

2022-04-07 12:41:17
1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075
2022-04-04 10:16:34
1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075
2022-03-22 07:52:24
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-22 07:36:36
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-18 06:31:18
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-14 13:11:15
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-09 06:42:36
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-07 11:48:46
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848

 

Thanks in Advance

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

When you say dedup does not work, in what way?

See this example, which sets up the data you have below then uses dedup to remove duplicate values of abc

| makeresults
| eval _raw="2022-04-07 12:41:17	1334821020002|1334821020007|1334821020011|1334821020024|1334821020027|1334821020043|1334821020053|1334821020075
2022-04-04 10:16:34	1334821020002|1334821020007|1334821020011|1334821020024|1334821020027|1334821020043|1334821020053|1334821020075
2022-03-22 07:52:24	1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-22 07:36:36	1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-18 06:31:18	1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-14 13:11:15	1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-09 06:42:36	1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-07 11:48:46	1335221020591|1335221020597|1335221020619|1335221020721|1335221020848"
| multikv noheader=t
| eval _time=strptime(Column_1, "%F %T")
| eval abc=split(Column_2, "|")
| table _time, abc
| dedup abc

vrmandadi
Builder

Sorry if I was not clear on what I am expecting ..The tabular format that I showed is output of .

| stats values(SN) as abc by _time . This search gives the tabular format output I showed .From that result I want to compare each cell of field "abc" with the next event and if there is a no difference then it should omit it and again it should compare with third one if its different it stays and now the third one should match with the next one and the list goes on

0 Karma

bowesmana
SplunkTrust
SplunkTrust

My question was asking you why dedup does not work - the example I showed you was your data using dedup to remove the duplicate abc values you refer to.

What is wrong with that example. If my example doesn't give you what you want, can you clarify with a table, what from your first table (from the stats values command) should appear as your final expected result.

 

0 Karma

vrmandadi
Builder

My search 
index=* 
| stats values(SN) as SN by _time gives the following output 

2022-04-07 12:41:17
1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075
2022-04-04 10:16:34
1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075
2022-03-22 07:52:24
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-22 07:36:36
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-18 06:31:18
1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075
2022-03-14 13:11:15
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-09 06:42:36
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
2022-03-02 09:38:12
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-02-28 21:40:30
1334821000038
1334821000073
1334821000330
1334821000337
1334821000343
1334821000346
1334821000419
1334821000466

 

The output that I am looking at is search that compares results and output;s non-consecutive results like below.The date 2022-03-09 06:42:36 has less number of values so that is also considered different... 

Since its comparing and not removing duplicates dedup is not worki

2022-04-07 12:41:17
1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075
2022-03-22 07:52:24
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-22 07:36:36
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-18 06:31:18
1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075
2022-03-14 13:11:15
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-09 06:42:36
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
2022-03-02 09:38:12
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-02-28 21:40:30
1334821000038
1334821000073
1334821000330
1334821000337
1334821000343
1334821000346
1334821000419
1334821000466
0 Karma

bowesmana
SplunkTrust
SplunkTrust

I am sorry - I still don't understand - in your latest example, you have 9 rows in your input and your expected output shows 8 rows with ONLY the second row removed (2022-04-04 10:16:34) as it is a duplicate of row 1

However, comparing rows 3 and 4, the abc values are identical, so why are these not removed?

2022-03-22 07:52:24
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848
2022-03-22 07:36:36
1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848

 

Are you also saying you ONLY want CONSECUTIVE duplicates removed, not ANY duplicate?

0 Karma

vrmandadi
Builder

Sorry for that .. I am trying to do a comparison for each row.For instance , compare row 1 with row 2...if both are same then omit row2 ...then compare row 1 with row3 ..if different i want that row...then compare row3 with row 4 if they are same then omit row4 and then the list goes on...The tough part I am facing is..dedup removes duplicates irrespective of what order they come in ...what I am looking is compare each row ..in this way if you see row1 data same as row 3 but row 2 is different . 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@vrmandadi 

ROW 1

2022-04-07 12:41:17

1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075

ROW 2

2022-04-04 10:16:34

1334821020002
1334821020007
1334821020011
1334821020024
1334821020027
1334821020043
1334821020053
1334821020075

ROW 3

2022-03-22 07:52:24

1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848

ROW 4

2022-03-22 07:36:36

1335221020082
1335221020268
1335221020282
1335221020591
1335221020597
1335221020619
1335221020721
1335221020848

 

Your input example is as above. In your latest reply, you say ROW 1 is the same as ROW 3, but ROW 2 is different, which doesn't seem to match the above data. It appears ROW 1 and ROW 2 are the same so would be excluded, which is how you showed your output in a previous reply.

After excluding ROW 2, you then compare ROW 1 and ROW 3, which are different, so you keep ROW 3, but you then say you want to compare ROW 3 and ROW 4. In your example you have NOT excluded ROW 4, although they are the same. 

It's still not clear WHY, in your earlier post, you are keeping both ROW 3 and ROW 4, when they are the same, but your description of the removal logic states comparing ROW 3 and ROW 4 would remove the duplicate, which is not done.

 

0 Karma

vrmandadi
Builder

Hello ,

 

Thank you so very much for following up on this  and please sincerely accept my apologies if any confusion is created .Let me put it in correct way.

 

in the example you posted

1. compare row1 with row2...if same then omit row2 else keep row2.

2. Next.Compare row1 with row 3(if row2 is omitted..else compare row2 with row3) ..if same omit row3...if different keep.

3.keep comparing .

 

Hope this is clear

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@vrmandadi 

Unfortunately, it  does not make it clearer. You do not address the question why you retain row 4 when it is the same as row 3.

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| streamstats window=1 current=f values(SN) as previous
| where SN!=previous
0 Karma

vrmandadi
Builder

I tried it but did not work

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Given that this only keeps events where SN does not match the previous SN, in what way did it not work?

0 Karma

PickleRick
Ultra Champion

Since the fields are multivalued I suppose the solution with values(SN) might mess up the order of values. With list(SN) it should work pretty well.

0 Karma
Get Updates on the Splunk Community!

Splunk Life | Happy Pride Month!

Happy Pride Month, Splunk Community! 🌈 In the United States, as well as many countries around the ...

SplunkTrust | Where Are They Now - Michael Uschmann

The Background Five years ago, Splunk published several videos showcasing members of the SplunkTrust to share ...

Admin Your Splunk Cloud, Your Way

Join us to maximize different techniques to best tune Splunk Cloud. In this Tech Enablement, you will get ...