Dear all,
I am wondering what is the best strategy regarding indexing data coming from a relational database which could change over the time.
Let's take the example of a CRM where you could find a table called "CUSTOMER". That record contains several infos like "last name", "first name" or "address". "Address" field may change over time.
If I index that record (and the others from the same table) because I would like to provide some statistics regarding my customer base using splunk, what would be the best strategy to handle the case where one field may change ?
If my indexation strategy is based on a "modifiedDate" field, the record will get indexed as many times as there are modifications in my table. But as a result, I will get the same customer several times in my index and the statistical results will become wrong.
What would you advise ?
Splunk does not allow for updating of indexed data. So you are correct that each time there is a change, you will have a new event in Splunk. However, you can still analyze the data correctly by only looking at the latest data by customer. Here is an example:
Sample Data:
custname,mydata,moddate
John Smith,123,07/24/18 09:40:00
Bob Jones,987,07/22/18 12:00:00
Sue Simmons,576,04/12/18 08:00:00
John Smith,321,03/20/17 15:00:00
Bob Jones,142,05/11/16 08:00:00
Search:
index="scratchpad" sourcetype="updateddata"
| stats latest(mydata) by custname
Search Results: