Splunk Search

Indexing updatable data from a relational database

ecoquelin
Explorer

Dear all,

I am wondering what is the best strategy regarding indexing data coming from a relational database which could change over the time.

Let's take the example of a CRM where you could find a table called "CUSTOMER". That record contains several infos like "last name", "first name" or "address". "Address" field may change over time.

If I index that record (and the others from the same table) because I would like to provide some statistics regarding my customer base using splunk, what would be the best strategy to handle the case where one field may change ?

If my indexation strategy is based on a "modifiedDate" field, the record will get indexed as many times as there are modifications in my table. But as a result, I will get the same customer several times in my index and the statistical results will become wrong.

What would you advise ?

Tags (1)
0 Karma

kmorris_splunk
Splunk Employee
Splunk Employee

Splunk does not allow for updating of indexed data. So you are correct that each time there is a change, you will have a new event in Splunk. However, you can still analyze the data correctly by only looking at the latest data by customer. Here is an example:

Sample Data:

custname,mydata,moddate
John Smith,123,07/24/18 09:40:00
Bob Jones,987,07/22/18 12:00:00
Sue Simmons,576,04/12/18 08:00:00
John Smith,321,03/20/17 15:00:00
Bob Jones,142,05/11/16 08:00:00

Search:

index="scratchpad" sourcetype="updateddata" 
| stats latest(mydata) by custname

Search Results:
alt text

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...