Splunk Search

How to collect and dedup the newest to a new Index

sai33
Explorer

Hello Splunkers,

I've got an existing index which I would like to process and collect in a new Index. My rough idea is as following:

  • Use Sort and get the latest(Newest) event in the existing Index - BY(Group by) ID
  • Collect(Copy) only the first(Newest) event from the Above Index to a New Index.

My sample data in the existing Index looks like below:

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
1, Purchase, 11.08.2019-15:30
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

The new data in my New Index should be a Collect from the Above Index

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

If you observe the second Event for ID 1 is not present in the second Index.

I'm believing this should be possible using Sort, Dedup and Collect. Please suggest the best possible method. I've to move an Index of around 5GB.

Thanks!!

0 Karma

niketn
Legend

@sai33 does the DateTime field in index1 corresponds to _time field in your data?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

sai33
Explorer

I'm not exactly sure what _time you're refering to. But, this is the timestamp(Date & Time of the Event)
Since, being a newbie to Splunk i'm relatively new to technical terms.
Sorry for the trouble!

0 Karma

niketn
Legend

In order for the community to assist you better you would need to provide your current SPL (mock/anonymize any sensitive information before posting the same).

Can you print the following table and see if _time has same value as DateTime or not?

<yourIndex1Query>
| table _time ID Action DateTime

_time is the Time of the event that you define while indexing the data in Splunk. It is one of the most crucial piece of information that Splunk would need while indexing as any incorrect timestamp in indexed event would imply that none of your correlation/queries would work as expected.

While this is not directly related to the answer to your question here, I would recommend you to understand this as the first step for indexing data correctly. So, refer to documentation: https://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps
Also, second most crucial step is Event Breaking which tells Splunk the boundary of each event as it processes streaming data input. Incorrect event breaks would imply that there may be unwanted events overlap or drop. So read the following documentation as well: https://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Modern way of developing distributed application using OTel

Recently, I had the opportunity to work on a complex microservice using Spring boot and Quarkus to develop a ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had 3 releases of new security content via the Enterprise Security ...

Archived Metrics Now Available for APAC and EMEA realms

We’re excited to announce the launch of Archived Metrics in Splunk Infrastructure Monitoring for our customers ...