Getting Data In

How to Re-index from CSV sourcetype

boromir
Path Finder

Hi all, 

I am facing strange behavior,  for which I can't find anything in the docs.

I have a source that generates CSV files(comma sep.). They are indexed in a dedicated index, and sourcetype.

A look-alike example :

id,service_id,product_id,shop_id,user_id,blah_blah,whatever,name,date,client_id
1,34456789,12234,23,4,f,45678,ivan,2022-01-13 07:04:49,1
2,34452789,12134,25,4,f,45678,ivan,2022-01-13 07:14:49,1
3,34451789,12134,27,4,f,45678,ivan,2022-01-13 07:14:49,1
4,34451789,12134,27,4,f,45678,ivan,2022-01-13 07:15:49,1
5,34451789,12133,23,4,f,45678,ivan,2022-01-13 07:15:49,1
6,34456789,12234,23,4,f,45678,ivan,2022-01-13 07:04:49,1
7,34452789,12134,25,4,f,45678,ivan,2022-01-13 07:14:49,1
8,34451789,12134,27,4,f,45678,ivan,2022-01-13 07:14:49,1
9,34451789,12134,27,4,f,45678,ivan,2022-01-13 07:15:49,1

Now, the challenge no1 is that the script that generates the csv, can edit on already existing lines.

challenge no2 is that this does not result in one and the same behaviors all the time.

If a simple value is changed ( from 1 to 2, or from ivan to ivag - important is same number of characters) , the change is no-were to be found in the indexed data. However, if the change includes a change in the number of characters(say ivan becomes johnathan) then, the whole file is re-indexed with the new value, causing lots of duplications.

I am sure that this must be documented somewhere....but I can not find it, thus can not really understand it. 

Does anyone know what is going on( I managed to find something in the community about splunk checking the first 256 char. of a file to decide, but I have tested changing both before 256 threshold and after it).....?

 

Kind regards!

rd

 

Labels (3)
0 Karma

m_pham
Splunk Employee
Splunk Employee

I think you already played around with "initCrcLength" config in inputs.conf which just tells Splunk how far into the file to compare the hash. It's just going to re-ingest the whole file no matter what you do with CSV files and you end up with duplicated data with the way you wrote your script (modifying specific values in the CSV file). I think a better way is to create a new CSV file whenever there is a change and modify your inputs.conf accordingly to ingest the files. 

I might not have the best answer so wait around for more suggestions.

 

References:

https://wiki.splunk.com/Community:HowSplunkReadsInputFiles

https://docs.splunk.com/Documentation/Splunk/latest/Admin/inputsconf

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Tiling

This puzzle (first published here) is based on finding groups of tessellated tiles (inspired by floor tiles I ...

SOK it to Me: Top 3 Benefits of Using Splunk Operator on Kubernetes that’ll Make ...

    Thursday, July 9, 2026  |  11:00AM–12:00PM PDT Duration: 1 hour (includes Q&A) Managing can feel like a ...

Upgrade Prep for 10.4, Network Observability Deep Dives, and More from Splunk Lantern

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...