Getting Data In

If you had a choice, would you use CSV or XML files for Splunk to eat?

dhaffner
Path Finder

I'm looking at a log source that will be sent once a day to a forwarder as one large file. Then it will be eaten by Splunk and sent on to the indexer. Is it better to use XML or CSV? Maybe even JSON?

Here's what we want:

  1. A cron job runs on the Application server to obtain the past day's incidents and put them into a CSV or XML format file
  2. The file is transferred to our global forwarder through some secure mechanism. (SCP?)
  3. The forwarder monitors the location of the file and feeds the data into Splunk
  4. The process is repeated each day, and the file on the forwarder is overwritten by the new file that gets sent across

Thanks!

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

dhaffner
Path Finder

All good insights. Thanks to you both!

0 Karma

Paolo_Prigione
Builder

I'd recommend csv or even something taking advantage of splunk's default key-value extraction, like

2011/01/27 21:34:32.432 host severity=ERROR userId=ted transaction=w4534rp234 message="..... ... ..."

that way with 0 config you'd have all the fields you explicitly listed in the log.

Json and XML are complex formats and dealing with them is not as straighforward as good old csv or similar

Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...