Splunk Search

How to dedup in lookup specifying the "_time" field?

denissotoacc
Path Finder

Well, my question is not that intuitive, but I will deep dive here:

Let's suppose I have this lookup:

NameProductSell_Date
DenisBread2022-02-21
MariaBeer2022-02-23
DenisWater2022-01-27
DenisCheese2022-03-05
MariaBeer2021-12-12


I need to get the last "Sell_Date" grouping by "Name". In this case:

NameProductSell_Date
DenisCheese2022-03-05
MariaBeer2022-02-23


I know there is "dedup" command, but it's not working because "Sell_Date" is not being considered as "_time" field because this is a lookup and not an Index. I'm getting the wrong row as dedup result.

How can I get a custom dedup, specifying the field that should work as "_time"? 

Labels (1)
0 Karma
1 Solution

denissotoacc
Path Finder

It was easier than I thought. The dedup would get the last event. As the source is a lookup and the field is not the default "_time", I just needed to sort by Sell_Date before. In my case, the sort of this field works properly (it was detected as a proper datetime field) so I don't need to strptime before sorting.

| inputlookup mylookup
| sort -Sell_Date
| dedup Name

 

View solution in original post

denissotoacc
Path Finder

It was easier than I thought. The dedup would get the last event. As the source is a lookup and the field is not the default "_time", I just needed to sort by Sell_Date before. In my case, the sort of this field works properly (it was detected as a proper datetime field) so I don't need to strptime before sorting.

| inputlookup mylookup
| sort -Sell_Date
| dedup Name

 

yuanliu
SplunkTrust
SplunkTrust

If you take the thought away from dedup, you will see the solution clearer. (Dedup is so often used to seek out the latest events, I can see the trap.)  The requirement itself is really about sorting and retrieving.

| inputlookup sell_date
| stats values(eval(Sell_Date . "|" . Product)) as date_product by Name ``` values() sorts by default ```
| eval date_product = split(mvindex(date_product, -1), "|") ``` acccess the last joined record, then split back ```
| eval Sell_Date = mvindex(date_product, 0), Product = mvindex(date_product, 1)
| fields - date_product

 

Tags (3)

PickleRick
SplunkTrust
SplunkTrust

I don't quite follow what you need. The stats summarizing commands work pretty much the same regardless of whether the data comes from the search across indexes or from the inputlookup.

If your concern is that you can't use latest(whatever) due to lack of _time field, just make your own _time field. Since your Sell_Date is probably a string, you can't simply rename it to _time (well, you can but it won't work properly). You'll have to use eval with strptime to parse the timestamp from the string.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...