Hi
We are running an outputcsv command in hourly intervals through a python script. We have not mentioned append option in the query. I would like to know what should be the expected behavior of Splunk. Will the csv file be overwritten every hour? Will the headers alone be retained? Please clarify.
Thanks in Advance.
The given csv will be overwritten every time outputcsv runs, headers and all. the new headers will simply match the fields in the new results.
It is the same for the outputlookup command.
Also, in a lot of real-world use cases, using the append flag on the outputcsv command itself can result in a lot of duplicates. As a result it can be better to do the appending separately, along with a little search language to remove the duplicates as appropriate.
Here is a simple example, where the csv has a primary key called 'user', and the csv is just mapping each user to a "group" field.
<search terms to get the "new" rows mapping users to groups> | stats last(group) as group by user | append [| inputcsv mycsv] | stats first(group) as group by user | outputcsv mycsv
As you can see, each time the search runs it will update the corresponding users that have changed, but not duplicate them. Obviously it's a very simple example with only two fields, but with a little more attention to the stats commands you can use the same technique. Note that it's better to put the inputcsv command in the append; if you put the actual search in the append you may increase the chances of hitting limits in append concerning execution time or number of rows.
The given csv will be overwritten every time outputcsv runs, headers and all. the new headers will simply match the fields in the new results.
It is the same for the outputlookup command.
Also, in a lot of real-world use cases, using the append flag on the outputcsv command itself can result in a lot of duplicates. As a result it can be better to do the appending separately, along with a little search language to remove the duplicates as appropriate.
Here is a simple example, where the csv has a primary key called 'user', and the csv is just mapping each user to a "group" field.
<search terms to get the "new" rows mapping users to groups> | stats last(group) as group by user | append [| inputcsv mycsv] | stats first(group) as group by user | outputcsv mycsv
As you can see, each time the search runs it will update the corresponding users that have changed, but not duplicate them. Obviously it's a very simple example with only two fields, but with a little more attention to the stats commands you can use the same technique. Note that it's better to put the inputcsv command in the append; if you put the actual search in the append you may increase the chances of hitting limits in append concerning execution time or number of rows.