Splunk Search

How to collect and dedup the newest to a new Index

sai33
Explorer

Hello Splunkers,

I've got an existing index which I would like to process and collect in a new Index. My rough idea is as following:

  • Use Sort and get the latest(Newest) event in the existing Index - BY(Group by) ID
  • Collect(Copy) only the first(Newest) event from the Above Index to a New Index.

My sample data in the existing Index looks like below:

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
1, Purchase, 11.08.2019-15:30
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

The new data in my New Index should be a Collect from the Above Index

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

If you observe the second Event for ID 1 is not present in the second Index.

I'm believing this should be possible using Sort, Dedup and Collect. Please suggest the best possible method. I've to move an Index of around 5GB.

Thanks!!

0 Karma

niketn
Legend

@sai33 does the DateTime field in index1 corresponds to _time field in your data?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

sai33
Explorer

I'm not exactly sure what _time you're refering to. But, this is the timestamp(Date & Time of the Event)
Since, being a newbie to Splunk i'm relatively new to technical terms.
Sorry for the trouble!

0 Karma

niketn
Legend

In order for the community to assist you better you would need to provide your current SPL (mock/anonymize any sensitive information before posting the same).

Can you print the following table and see if _time has same value as DateTime or not?

<yourIndex1Query>
| table _time ID Action DateTime

_time is the Time of the event that you define while indexing the data in Splunk. It is one of the most crucial piece of information that Splunk would need while indexing as any incorrect timestamp in indexed event would imply that none of your correlation/queries would work as expected.

While this is not directly related to the answer to your question here, I would recommend you to understand this as the first step for indexing data correctly. So, refer to documentation: https://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps
Also, second most crucial step is Event Breaking which tells Splunk the boundary of each event as it processes streaming data input. Incorrect event breaks would imply that there may be unwanted events overlap or drop. So read the following documentation as well: https://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...