Splunk Search

How to collect and dedup the newest to a new Index

sai33
Explorer

Hello Splunkers,

I've got an existing index which I would like to process and collect in a new Index. My rough idea is as following:

  • Use Sort and get the latest(Newest) event in the existing Index - BY(Group by) ID
  • Collect(Copy) only the first(Newest) event from the Above Index to a New Index.

My sample data in the existing Index looks like below:

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
1, Purchase, 11.08.2019-15:30
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

The new data in my New Index should be a Collect from the Above Index

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

If you observe the second Event for ID 1 is not present in the second Index.

I'm believing this should be possible using Sort, Dedup and Collect. Please suggest the best possible method. I've to move an Index of around 5GB.

Thanks!!

0 Karma

niketn
Legend

@sai33 does the DateTime field in index1 corresponds to _time field in your data?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

sai33
Explorer

I'm not exactly sure what _time you're refering to. But, this is the timestamp(Date & Time of the Event)
Since, being a newbie to Splunk i'm relatively new to technical terms.
Sorry for the trouble!

0 Karma

niketn
Legend

In order for the community to assist you better you would need to provide your current SPL (mock/anonymize any sensitive information before posting the same).

Can you print the following table and see if _time has same value as DateTime or not?

<yourIndex1Query>
| table _time ID Action DateTime

_time is the Time of the event that you define while indexing the data in Splunk. It is one of the most crucial piece of information that Splunk would need while indexing as any incorrect timestamp in indexed event would imply that none of your correlation/queries would work as expected.

While this is not directly related to the answer to your question here, I would recommend you to understand this as the first step for indexing data correctly. So, refer to documentation: https://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps
Also, second most crucial step is Event Breaking which tells Splunk the boundary of each event as it processes streaming data input. Incorrect event breaks would imply that there may be unwanted events overlap or drop. So read the following documentation as well: https://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

AppDynamics Summer Webinars

This summer, our mighty AppDynamics team is cooking up some delicious content on YouTube Live to satiate your ...

SOCin’ it to you at Splunk University

Splunk University is expanding its instructor-led learning portfolio with dedicated Security tracks at .conf25 ...

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Organizations handling credit card transactions know that PCI DSS compliance is both critical and complex. The ...