I've got a system based on an XML API that will be spitting out a good amount of data (100k's of events an hour?) in XML format. We'll be using scripted inputs to retrieve the data, so the format can be changed prior to it being indexed. My question is this: how much if any work should be spent on munging the data in terms of impact that can have on search and index performance? Is there benefit to converting it to JSON, for example, or flattening it into tables or KV pairs? Or should I not bother and just do that work at search time?
I don't yet know how much baggage the XML will come with for a given event type. Obviously if a single event is 50% larger that's got to be a part of the equation. Let's assume for the sake of argument that the sizes are roughly similar.
Hal, it depends on how deeply nested the xml is. I don't think there is much difference in terms of parsing xml or JSON. Though, i have seen with other customers, that having xml events with several thousand lines severely impacts search performance. I would write it out to key value pairs if it were up to me, but if the events are small it shouldn't cause too much trouble. my .02