Solved: Running outputcsv without append option

keerthana_k · ‎04-25-2013

Hi

We are running an outputcsv command in hourly intervals through a python script. We have not mentioned append option in the query. I would like to know what should be the expected behavior of Splunk. Will the csv file be overwritten every hour? Will the headers alone be retained? Please clarify.

Thanks in Advance.

sideview · ‎04-25-2013

The given csv will be overwritten every time outputcsv runs, headers and all. the new headers will simply match the fields in the new results.

It is the same for the outputlookup command.

Also, in a lot of real-world use cases, using the append flag on the outputcsv command itself can result in a lot of duplicates. As a result it can be better to do the appending separately, along with a little search language to remove the duplicates as appropriate.

Here is a simple example, where the csv has a primary key called 'user', and the csv is just mapping each user to a "group" field.

<search terms to get the "new" rows mapping users to groups> | stats last(group) as group by user | append [| inputcsv mycsv] | stats first(group) as group by user | outputcsv mycsv

As you can see, each time the search runs it will update the corresponding users that have changed, but not duplicate them. Obviously it's a very simple example with only two fields, but with a little more attention to the stats commands you can use the same technique. Note that it's better to put the inputcsv command in the append; if you put the actual search in the append you may increase the chances of hitting limits in append concerning execution time or number of rows.

View solution in original post

sideview · ‎04-25-2013

The given csv will be overwritten every time outputcsv runs, headers and all. the new headers will simply match the fields in the new results.

It is the same for the outputlookup command.

Also, in a lot of real-world use cases, using the append flag on the outputcsv command itself can result in a lot of duplicates. As a result it can be better to do the appending separately, along with a little search language to remove the duplicates as appropriate.

Here is a simple example, where the csv has a primary key called 'user', and the csv is just mapping each user to a "group" field.

<search terms to get the "new" rows mapping users to groups> | stats last(group) as group by user | append [| inputcsv mycsv] | stats first(group) as group by user | outputcsv mycsv

As you can see, each time the search runs it will update the corresponding users that have changed, but not duplicate them. Obviously it's a very simple example with only two fields, but with a little more attention to the stats commands you can use the same technique. Note that it's better to put the inputcsv command in the append; if you put the actual search in the append you may increase the chances of hitting limits in append concerning execution time or number of rows.

Running outputcsv without append option

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

span_metrics: The OpenTelemetry-Idiomatic Way to See Inside Your Services

Join the Conversation