i have a large CSV file / lookup table which i'm writing to via outputlookup.
it's approaching 1G in size and i'm wondering how to best prune it.
can i delete the original data file (using cron for instance) and still use the .idx files for lookups?
or is it best to trim it somehow using a parallel splunk search which removes old entries? and if so - any advice / links on how to do that?
If you are exporting something like csv in regular interval (like report) then you can use outputcsv command to export instead of outputlookup.
In other scenario, Assuming that you are comfortable with exported 1G and using output lookup in your search queries without any problem and replication is causing issue means then you can use either blacklist(not to take part of replication) /white list(by default all lookup will take part of replication bundle) to protect your huge lookups take part of replication bundle.
Considering this scenario, if you are using the outputlookup csv to lookup with some events and facing limitation issue then you can try with summary index. it would be easier.
Hope this will help you.