Splunk Search

About using "bin" command with "dedup" command

yutaka1005
Builder

Each events were outputed to sample1.csv and sample2.csv at same one-minute intervals.

However, when we performed the following search, the last result (the oldest data) was only one value. (It is weird because each value should have two values for each _time)

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
| dedup _time,source

*Timerange is "all time"

Why is this result shown?

Also, when I specify the time range that is specified in one-minute increments corresponding to the result that was going wrong, the results are displayed two.

Is this a known issue?

It would be greatly appreciated if someone knows about it.

Additional info

Splunk ver: 6.2.7
amount of log: 5762

↓Capture1
alt text

↓Capture2
alt text

0 Karma
1 Solution

HiroshiSatoh
Champion

Is the number of cases really different? Do the following searches make a difference in execution?

index=test source="sample1.csv" OR source="sample2.csv" |stats count by source

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source |stats count by source

View solution in original post

0 Karma

HiroshiSatoh
Champion

Is the number of cases really different? Do the following searches make a difference in execution?

index=test source="sample1.csv" OR source="sample2.csv" |stats count by source

index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source |stats count by source
0 Karma

yutaka1005
Builder

Thank you for answer.

Absolutely yes.

Results number of those two searches are different.

please look at Capture 1!

0 Karma

HiroshiSatoh
Champion

_time,sourceで重複するデータがある。またはBINで重複するようになるデータがある。ということになりますね。5件も違っているのでBINのバグではないと思います。

index=test source="sample1.csv" | bin span=1m _time |timechart count span=1m

で2件以上あるところや0件のところがあるんじゃないですか?

0 Karma

yutaka1005
Builder

以下のサーチを行いました。
index=test source="sample1.csv" | bin span=1m _time |timechart count span=1m
index=test source="sample2.csv" | bin span=1m _time |timechart count span=1m

しかしsample1.csv・sample2.csv共に2018-01-22 08:49:00から2018-01-24 08:49:00の範囲ですべてcountは1となり、イベント数も2881なので、_time及びsource、binで重複するデータはあり得ないです。

0 Karma

HiroshiSatoh
Champion

dedupのバグっぽいですね。ソートを入れれば大丈夫みたいです。

 index=test source="sample1.csv" OR source="sample2.csv" | bin span=1m _time 
 | dedup _time,source sortby -_time
0 Karma

FrankVl
Ultra Champion

Are you sure the original timestamps are in the same minute for the 2 sources, can you show a screenshot of those events?

0 Karma

yutaka1005
Builder

Thank you for comment.

Absolutely yes.

Two events of two sources are exist in 1/22/2018 8:49 AM.
But after using "bin" and "dedup" either one is gone.

please look at Capture2!

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...