Splunk is not good for document search, which is what most website search is. For example, stemming, synonyms, phrases, ranking by proximity, or ranked relevance on things other than time is not native to Splunk's indexing. Nor is extraction from non-text sources such as Word or PDF documents, you will also probably need to spend a disproportionate amount of time dealing with extracting only relevant parts of HTML.
... View more
I have no experience using DB Connect, but I can help you out with indexing and extracting XML.
To index the xml files (if they always dump to the same directory)
edit your local/inputs.conf file and add
[monitor:///directory/*.xml]
sourcetype = theSourcetypeYouWant
index = theindexyouwantitin
crcSalt = <source>
alwaysOpenFile = 1
disabled = false
You should be able to search for sourcetype=theSourcetypeYouWant and find the data indexed. To extract fields out of the XML you need to do one of 2 things. Either click next to the down arrow next to an event and select Extract Fields and then give example values. It will auto create the regex for you and you can save it for future use. Or you can manually create the extractions. If you don't have it already go into etc/system/local/props.conf file. Add the following:
[theSourcetypeYouWant]
EXTRACT-YourXMLExtractionName1 = (?i)<XMLtagInRawEvent1>(?P<YourXMLExtractionName1>[^<]+)
EXTRACT-YourXMLExtractionName2 = (?i)<XMLtagInRawEvent2>(?P<YourXMLExtractionName2>[^<]+)
Replace the variables with whatever you want to call the sourcetype, extraction above. XMLtagInRawEvent will be what the actual tag is in the xml file. If you do this correctly you will see the fields on the left side when you do a search. Remember to restart splunk if you edit the props.conf file as explained.
I was in the same boat as you 6 months ago. Splunk is powerful, but you have to really immerse yourself to get what you want out of the tool past simple searches.
... View more