Splunk Search

About using "bin" command with "dedup" command

yutaka1005
Builder

Each events were outputed to sample1.csv and sample2.csv at same one-minute intervals.

However, when we performed the following search, the last result (the oldest data) was only one value. (It is weird because each value should have two values for each _time)

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
| dedup _time,source

*Timerange is "all time"

Why is this result shown?

Also, when I specify the time range that is specified in one-minute increments corresponding to the result that was going wrong, the results are displayed two.

Is this a known issue?

It would be greatly appreciated if someone knows about it.

Additional info

Splunk ver: 6.2.7
amount of log: 5762

↓Capture1
alt text

↓Capture2
alt text

0 Karma
1 Solution

HiroshiSatoh
Champion

Is the number of cases really different? Do the following searches make a difference in execution?

index=test source="sample1.csv" OR source="sample2.csv" |stats count by source

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source |stats count by source

View solution in original post

0 Karma

HiroshiSatoh
Champion

Is the number of cases really different? Do the following searches make a difference in execution?

index=test source="sample1.csv" OR source="sample2.csv" |stats count by source

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source |stats count by source
0 Karma

yutaka1005
Builder

Thank you for answer.

Absolutely yes.

Results number of those two searches are different.

please look at Capture 1!

0 Karma

HiroshiSatoh
Champion

_time,sourceで重複するデータがある。またはBINで重複するようになるデータがある。ということになりますね。5件も違っているのでBINのバグではないと思います。

index=test source="sample1.csv" | bin span=1m _time |timechart count span=1m

で2件以上あるところや0件のところがあるんじゃないですか?

0 Karma

yutaka1005
Builder

以下のサーチを行いました。
index=test source="sample1.csv" | bin span=1m _time |timechart count span=1m
index=test source="sample2.csv" | bin span=1m _time |timechart count span=1m

しかしsample1.csv・sample2.csv共に2018-01-22 08:49:00から2018-01-24 08:49:00の範囲ですべてcountは1となり、イベント数も2881なので、_time及びsource、binで重複するデータはあり得ないです。

0 Karma

HiroshiSatoh
Champion

dedupのバグっぽいですね。ソートを入れれば大丈夫みたいです。

 index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source sortby -_time
0 Karma

FrankVl
Ultra Champion

Are you sure the original timestamps are in the same minute for the 2 sources, can you show a screenshot of those events?

0 Karma

yutaka1005
Builder

Thank you for comment.

Absolutely yes.

Two events of two sources are exist in 1/22/2018 8:49 AM.
But after using "bin" and "dedup" either one is gone.

please look at Capture2!

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...