I'm looking at a log source that will be sent once a day to a forwarder as one large file. Then it will be eaten by Splunk and sent on to the indexer. Is it better to use XML or CSV? Maybe even JSON?
Here's what we want:
Thanks!
I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.
The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.
Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more"
. This allows you the flexibility of setting fields differently per event.
Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.
I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.
The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.
Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more"
. This allows you the flexibility of setting fields differently per event.
Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.
All good insights. Thanks to you both!
I'd recommend csv or even something taking advantage of splunk's default key-value extraction, like
2011/01/27 21:34:32.432 host severity=ERROR userId=ted transaction=w4534rp234 message="..... ... ..."
that way with 0 config you'd have all the fields you explicitly listed in the log.
Json and XML are complex formats and dealing with them is not as straighforward as good old csv or similar