How to get latest upload data.I am uploading csv f...

abi2023 · ‎10-20-2023

I am uploading csv file format data into splunk. every time I make change to the data or add any info I will update the full csv file into splunk.
now I have duplicate event in splunk.
Is it possible to sort by only last upload csv file data show?

Thanks

PickleRick · ‎10-20-2023

No. If you upload a file via "add data" screen, the events are getting indexed and are immutable. There is no such thing as "updating" the events.

Also, why would you upload the same csv multiple times? Why would you even upload csv at all? In normal production environment you typically monitor log files or get events ingested in a different continuous way. Sometimes you upload samples of logs into dev/testing environments but that's a different case and there you usually don't mind the duplicates and/or you'd simply delete and recreate the index if duplication was an issue for you.

yuanliu · ‎10-21-2023

Not completely impossible. But before discussing workarounds, I have the same question as @PickleRick does: Why? Are they the same events (with the same timestamp, etc.)? Does the CSV even represent time series events? If they are the same events but with updates, why not delete previously loaded events before upload? I use CSV upload regularly. Each contains different events. Even so, I name files differently in part for peace of mind.

abi2023 · ‎10-23-2023

Below is my CSV

In this table when fist identify the Flow in our app we will update csv file with _key, App_name Date_find , Risk, and Status. when update happen the I will upload or ingest the csv file into Splunk. almost real time. this csv we are keeping it as lookup outside Splunk. So nothing get deleted. when I ingest or upload all the pervious entry get ingest in Splunk. only different is timestamp time at the ingestion. so all the entry such as _key 1 ,2, so get same timestamp. I want to know if it possible to return the latest result only. so I will have all the data and not any duplicate. otherwise I need to find the different solution.

Same thing happen when flow get fix Remediate_date, Risk_Afterremediate, and status get updated. file get ingested into Splunk.

Thank you in advance.

_key	App_name	Date_find	Status	Risk	Remediate_date	Risk_After remediate	Status
1	App1	12/04/2022	Open	Critical	12/10/2022	Sustainable	Closed
2	App2	01/26/2023	Open	Moderate	02/12/2023	Sustainable	Close

yuanliu · ‎10-23-2023

You still need to explain your use case in Splunk. As I said, I use CSV update regularly; in fact, my CSV files have a similar structure. In my case, I have two timestamps of particular interest, "First Detected" and "Last Detected", both of them similar to "Date_Find" in your example. But "Last Detected" changes in every scan. So, I use this field as _time when I ingest.

What do you use as _time? Do you have a field that changes every time?
If you do not select a field in the CSV as _time, Splunk will use the time of your upload as _time. Will that serve your purpose?
If there is no value of _time that make sense in your data, can you just use file name to determine which is the latest? (To exemplify, there are lots of data inconsistence in my CSV files. So in some searches I simply rely on file name - which translates into source field.)

PickleRick · ‎10-23-2023

Sure. That's what stats first/last/earliest/latest/index_earliest/index_latest are for.

But:

1) Aren't you trying to do in Splunk something it's not supposed to be? (like a database table)

2) Why not use a lookup instead of ingesting events?

How to get latest upload data.I am uploading csv file into splunk.

fields

other

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)