Hello Splunkers,
I have a query where I did a |stats values(abc) as abc command over time .I got the below results . I want the abc column to omit the consecutive same results .I basically want to compare the results with the above one and show only which are different and count of it is different.
You can see below 2022-04-07 12:41:17 and 2022-04-04 10:16:34 have the same "abc " value .I want to omit the second one and then compare with the 3rd result which is different so keep that and then compare the 3rd value with 4th and continue that way..Also show the one which have less count like 2022-03-07 11:48:46 .I tried dedup but it did not work..Any suggestions?
2022-04-07 12:41:17 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
2022-04-04 10:16:34 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
2022-03-22 07:52:24 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-22 07:36:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-18 06:31:18 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-14 13:11:15 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-09 06:42:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-07 11:48:46 | 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
Thanks in Advance
When you say dedup does not work, in what way?
See this example, which sets up the data you have below then uses dedup to remove duplicate values of abc
| makeresults
| eval _raw="2022-04-07 12:41:17 1334821020002|1334821020007|1334821020011|1334821020024|1334821020027|1334821020043|1334821020053|1334821020075
2022-04-04 10:16:34 1334821020002|1334821020007|1334821020011|1334821020024|1334821020027|1334821020043|1334821020053|1334821020075
2022-03-22 07:52:24 1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-22 07:36:36 1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-18 06:31:18 1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-14 13:11:15 1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-09 06:42:36 1335221020082|1335221020268|1335221020282|1335221020591|1335221020597|1335221020619|1335221020721|1335221020848
2022-03-07 11:48:46 1335221020591|1335221020597|1335221020619|1335221020721|1335221020848"
| multikv noheader=t
| eval _time=strptime(Column_1, "%F %T")
| eval abc=split(Column_2, "|")
| table _time, abc
| dedup abc
Sorry if I was not clear on what I am expecting ..The tabular format that I showed is output of .
| stats values(SN) as abc by _time . This search gives the tabular format output I showed .From that result I want to compare each cell of field "abc" with the next event and if there is a no difference then it should omit it and again it should compare with third one if its different it stays and now the third one should match with the next one and the list goes on
My question was asking you why dedup does not work - the example I showed you was your data using dedup to remove the duplicate abc values you refer to.
What is wrong with that example. If my example doesn't give you what you want, can you clarify with a table, what from your first table (from the stats values command) should appear as your final expected result.
My search
index=*
| stats values(SN) as SN by _time gives the following output
2022-04-07 12:41:17 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
2022-04-04 10:16:34 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
2022-03-22 07:52:24 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-22 07:36:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-18 06:31:18 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
2022-03-14 13:11:15 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-09 06:42:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 |
2022-03-02 09:38:12 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-02-28 21:40:30 | 1334821000038 1334821000073 1334821000330 1334821000337 1334821000343 1334821000346 1334821000419 1334821000466 |
The output that I am looking at is search that compares results and output;s non-consecutive results like below.The date 2022-03-09 06:42:36 has less number of values so that is also considered different...
Since its comparing and not removing duplicates dedup is not worki
2022-04-07 12:41:17 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
2022-03-22 07:52:24 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-22 07:36:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-18 06:31:18 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
2022-03-14 13:11:15 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-09 06:42:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 |
2022-03-02 09:38:12 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-02-28 21:40:30 | 1334821000038 1334821000073 1334821000330 1334821000337 1334821000343 1334821000346 1334821000419 1334821000466 |
I am sorry - I still don't understand - in your latest example, you have 9 rows in your input and your expected output shows 8 rows with ONLY the second row removed (2022-04-04 10:16:34) as it is a duplicate of row 1
However, comparing rows 3 and 4, the abc values are identical, so why are these not removed?
2022-03-22 07:52:24 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
2022-03-22 07:36:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
Are you also saying you ONLY want CONSECUTIVE duplicates removed, not ANY duplicate?
Sorry for that .. I am trying to do a comparison for each row.For instance , compare row 1 with row 2...if both are same then omit row2 ...then compare row 1 with row3 ..if different i want that row...then compare row3 with row 4 if they are same then omit row4 and then the list goes on...The tough part I am facing is..dedup removes duplicates irrespective of what order they come in ...what I am looking is compare each row ..in this way if you see row1 data same as row 3 but row 2 is different .
ROW 1 2022-04-07 12:41:17 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
ROW 2 2022-04-04 10:16:34 | 1334821020002 1334821020007 1334821020011 1334821020024 1334821020027 1334821020043 1334821020053 1334821020075 |
ROW 3 2022-03-22 07:52:24 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
ROW 4 2022-03-22 07:36:36 | 1335221020082 1335221020268 1335221020282 1335221020591 1335221020597 1335221020619 1335221020721 1335221020848 |
Your input example is as above. In your latest reply, you say ROW 1 is the same as ROW 3, but ROW 2 is different, which doesn't seem to match the above data. It appears ROW 1 and ROW 2 are the same so would be excluded, which is how you showed your output in a previous reply.
After excluding ROW 2, you then compare ROW 1 and ROW 3, which are different, so you keep ROW 3, but you then say you want to compare ROW 3 and ROW 4. In your example you have NOT excluded ROW 4, although they are the same.
It's still not clear WHY, in your earlier post, you are keeping both ROW 3 and ROW 4, when they are the same, but your description of the removal logic states comparing ROW 3 and ROW 4 would remove the duplicate, which is not done.
Hello ,
Thank you so very much for following up on this and please sincerely accept my apologies if any confusion is created .Let me put it in correct way.
in the example you posted
1. compare row1 with row2...if same then omit row2 else keep row2.
2. Next.Compare row1 with row 3(if row2 is omitted..else compare row2 with row3) ..if same omit row3...if different keep.
3.keep comparing .
Hope this is clear
Unfortunately, it does not make it clearer. You do not address the question why you retain row 4 when it is the same as row 3.
| streamstats window=1 current=f values(SN) as previous
| where SN!=previous
I tried it but did not work
Given that this only keeps events where SN does not match the previous SN, in what way did it not work?
Since the fields are multivalued I suppose the solution with values(SN) might mess up the order of values. With list(SN) it should work pretty well.