Splunk Search

About using "bin" command with "dedup" command

Builder

Each events were outputed to sample1.csv and sample2.csv at same one-minute intervals.

However, when we performed the following search, the last result (the oldest data) was only one value. (It is weird because each value should have two values for each _time)

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
| dedup _time,source

*Timerange is "all time"

Why is this result shown?

Also, when I specify the time range that is specified in one-minute increments corresponding to the result that was going wrong, the results are displayed two.

Is this a known issue?

It would be greatly appreciated if someone knows about it.

Additional info

Splunk ver: 6.2.7
amount of log: 5762

↓Capture1
alt text

↓Capture2
alt text

0 Karma
1 Solution

Champion

Is the number of cases really different? Do the following searches make a difference in execution?

index=test source="sample1.csv" OR source="sample2.csv" |stats count by source

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source |stats count by source

View solution in original post

0 Karma

Champion

Is the number of cases really different? Do the following searches make a difference in execution?

index=test source="sample1.csv" OR source="sample2.csv" |stats count by source

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source |stats count by source

View solution in original post

0 Karma

Builder

Thank you for answer.

Absolutely yes.

Results number of those two searches are different.

please look at Capture 1!

0 Karma

Champion

_time,sourceで重複するデータがある。またはBINで重複するようになるデータがある。ということになりますね。5件も違っているのでBINのバグではないと思います。

index=test source="sample1.csv" | bin span=1m _time |timechart count span=1m

で2件以上あるところや0件のところがあるんじゃないですか?

0 Karma

Builder

以下のサーチを行いました。
index=test source="sample1.csv" | bin span=1m _time |timechart count span=1m
index=test source="sample2.csv" | bin span=1m _time |timechart count span=1m

しかしsample1.csv・sample2.csv共に2018-01-22 08:49:00から2018-01-24 08:49:00の範囲ですべてcountは1となり、イベント数も2881なので、_time及びsource、binで重複するデータはあり得ないです。

0 Karma

Champion

dedupのバグっぽいですね。ソートを入れれば大丈夫みたいです。

 index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source sortby -_time
0 Karma

Ultra Champion

Are you sure the original timestamps are in the same minute for the 2 sources, can you show a screenshot of those events?

0 Karma

Builder

Thank you for comment.

Absolutely yes.

Two events of two sources are exist in 1/22/2018 8:49 AM.
But after using "bin" and "dedup" either one is gone.

please look at Capture2!

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!