I am updating a CSV on disk via the search api using outputlookup. Each time I run my script using the same source CSV, the CSV on disk is updated correctly. It shows the correct number of rows.
When I try to read the CSV via UI or the /jobs & /export API it doesn't return the correct number of rows approximately 40% of the time.
I run my script 10 times, and 6 times it's a perfect match, the row count is what I expect it to be, 4 times it fails with random number of rows returned. Each time, the number of rows in the CSV on disk is correct though.
From what I can see, I am not hitting quotas or limits and it doesn't seem to be related to system load. There isn't anything obvious to me in any logs either.
My flow is like this:
My search to validate is like this, and this is where the inconsistencies happen:
| inputlookup my_new_file.csv | table <column name, column name....>
My CSV is currently 110 columns, and 5625 rows.
When it works, search returns the correct number of rows each time. When it doesn't work, it consistently returns the wrong number. To clarify, if i get 500 rows instead of the expected 5625, every time i search i will get 500 rows until I re-run my script to update the CSV on disk.
Thanks
Are you running in a clustered environment, because it can take some time for the updates to be distributed across the whole cluster?
It is a cluster, but I am searching from the same instance I am writing to (not going through load balancer).
I have a second script so I can check all three hosts in the cluster and the erroneous search results is consistent across all of them despite the row count being correct on all.
It seems to be replicating the CSV instantly from what i can tell. Regardless of which search head I hit, I get 735 rows returned even though all three search heads have the right row count in the CSV on disk.