Solved: Help with getting started reporting from XML files...

charlie_park2 · ‎09-27-2013

So on my Mac OSX I've installed Splunk. Downloaded DB Connect and the MySQL Java connector.

Still struggling to get started doing things. The documentation seems copious but not hand-holding, it's still fairly geeky. It would be GREAT to have use cases or examples of things that one can index (precise steps) and then query into a dashboard (precise steps, and precise outputs: bar graphs, pie charts, maybe even a table that reorganizes data so that it's easy to import into Excel, etc).

I have XML files in a directory. Some nodes in the XML files refer to columns in the MySQL table.

All I need is to index the XML files:

Optionally delete the source XML files after they're pulled into the Splunk index
Have a easy way to index any new XML files put there

Then, to index the data in the MySQL database:

Optionally delete the rows that're pulled into the Splunk index already
Have an easy way to index any new rows being added to the DB -- and time the periodicity of checking the DB to pull in new data
Have an easy way to combine data between the XML files and the DB with needing a PhD in geekdom

Any guides or step-by-step instructions to get me started? How do I take complex XML files and convert them into meaningful indexes for Splunk so that I can report them in an easy table? Pie charts etc can come later.

Splunk sounds like a really powerful tool and I'm very patient to want to learn it, but the sources of documentation presume that one is an IT admin. If the marketing promise of Splunk to make a foray into business analytics is real and sincere, we'd love to see some more hand-holding, step by step use-cases type documentation.

Thanks for any pointers!

antlefebvre · ‎09-29-2013

I have no experience using DB Connect, but I can help you out with indexing and extracting XML.

To index the xml files (if they always dump to the same directory)

edit your local/inputs.conf file and add

[monitor:///directory/*.xml]
sourcetype = theSourcetypeYouWant
index = theindexyouwantitin
crcSalt = <source>
alwaysOpenFile = 1
disabled = false

You should be able to search for sourcetype=theSourcetypeYouWant and find the data indexed. To extract fields out of the XML you need to do one of 2 things. Either click next to the down arrow next to an event and select Extract Fields and then give example values. It will auto create the regex for you and you can save it for future use. Or you can manually create the extractions. If you don't have it already go into etc/system/local/props.conf file. Add the following:

[theSourcetypeYouWant]
EXTRACT-YourXMLExtractionName1 = (?i)<XMLtagInRawEvent1>(?P<YourXMLExtractionName1>[^<]+)
EXTRACT-YourXMLExtractionName2 = (?i)<XMLtagInRawEvent2>(?P<YourXMLExtractionName2>[^<]+)

Replace the variables with whatever you want to call the sourcetype, extraction above. XMLtagInRawEvent will be what the actual tag is in the xml file. If you do this correctly you will see the fields on the left side when you do a search. Remember to restart splunk if you edit the props.conf file as explained.

I was in the same boat as you 6 months ago. Splunk is powerful, but you have to really immerse yourself to get what you want out of the tool past simple searches.

View solution in original post

antlefebvre · ‎09-29-2013

I have no experience using DB Connect, but I can help you out with indexing and extracting XML.

To index the xml files (if they always dump to the same directory)

edit your local/inputs.conf file and add

[monitor:///directory/*.xml]
sourcetype = theSourcetypeYouWant
index = theindexyouwantitin
crcSalt = <source>
alwaysOpenFile = 1
disabled = false

You should be able to search for sourcetype=theSourcetypeYouWant and find the data indexed. To extract fields out of the XML you need to do one of 2 things. Either click next to the down arrow next to an event and select Extract Fields and then give example values. It will auto create the regex for you and you can save it for future use. Or you can manually create the extractions. If you don't have it already go into etc/system/local/props.conf file. Add the following:

[theSourcetypeYouWant]
EXTRACT-YourXMLExtractionName1 = (?i)<XMLtagInRawEvent1>(?P<YourXMLExtractionName1>[^<]+)
EXTRACT-YourXMLExtractionName2 = (?i)<XMLtagInRawEvent2>(?P<YourXMLExtractionName2>[^<]+)

Replace the variables with whatever you want to call the sourcetype, extraction above. XMLtagInRawEvent will be what the actual tag is in the xml file. If you do this correctly you will see the fields on the left side when you do a search. Remember to restart splunk if you edit the props.conf file as explained.

I was in the same boat as you 6 months ago. Splunk is powerful, but you have to really immerse yourself to get what you want out of the tool past simple searches.

charlie_park2 · ‎09-28-2013

Thanks. I guess I asked for it 🙂

Even for a simple task -- get a bunch of XML files indexed and then queried in a way I wish. Splunk so far has not impressed. The documentation does not seem to have anything clear about structuring the indexing of XML in any sensible way; just by way of regexps, which are clunky and hard to maintain.

Same with the DB Connect. How do I index the data first time? Can I specify my keys that I want to use as index later for querying? Will the tables be "snorted" once, and then any new rows in the tables be automatically added to the index? What if I want to change, in the future, the manner in which I query the index? And so on. These questions don't seem easy to discover.

Help with getting started reporting from XML files in a directory

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

Help with getting started reporting from XML files in a directory

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...