Getting Data In

If you had a choice, would you use CSV or XML files for Splunk to eat?

dhaffner
Path Finder

I'm looking at a log source that will be sent once a day to a forwarder as one large file. Then it will be eaten by Splunk and sent on to the indexer. Is it better to use XML or CSV? Maybe even JSON?

Here's what we want:

  1. A cron job runs on the Application server to obtain the past day's incidents and put them into a CSV or XML format file
  2. The file is transferred to our global forwarder through some secure mechanism. (SCP?)
  3. The forwarder monitors the location of the file and feeds the data into Splunk
  4. The process is repeated each day, and the file on the forwarder is overwritten by the new file that gets sent across

Thanks!

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

dhaffner
Path Finder

All good insights. Thanks to you both!

0 Karma

Paolo_Prigione
Builder

I'd recommend csv or even something taking advantage of splunk's default key-value extraction, like

2011/01/27 21:34:32.432 host severity=ERROR userId=ted transaction=w4534rp234 message="..... ... ..."

that way with 0 config you'd have all the fields you explicitly listed in the log.

Json and XML are complex formats and dealing with them is not as straighforward as good old csv or similar

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...