Getting new data source in and parsed within a dis...

uhaba · ‎04-05-2017

Trying to understand the process for bringing in a new data source from Oracle. We have 3 indexers, 2 search heads, and 2 heavy forwarders amongst the various management and control systems. We also deploy UFs as needed to pick logs up. I understand the process of pointing a universal forwarder configuration at the log file directory for specific logs. I'm confused as to how to handle the new logs if splunk lacks off the shelf parsing configurations. For clarity, I can write the regex needed to extract the fields using rex in the search string. I don't really follow how to build out new props/transforms conf files and where to put them once they're created in our type of deployment. I obviously skipped an important question during our PS rollout and trying to play catch up now that we have quite a bit of the windows stuff worked out.

Does anyone have a walk through for new log source onboarding in a distributed deployment? Any guidance on approaching supporting conf file distribution and what systems need them? Recommendations on any tutorials or extraction accelerators would be greatly appreciated. (Currently working my way through the TA builder app to see if that gets me off the ground)

Reading through the docs, it would also be helpful for some guidance on how to approach multiple field formats in a common log source file. If the logs have 5 different field patterns in the same log file; is it better to individually extract a field using individual regular expressions or to pull out as many fields as I can in a single regex?
E.g. Generalized error log from a single JVM error log file I was looking at the other day:

Example A: Time Error Level JVM# Message Error Code
Example B: Time Error Level JVM# JVMFactory Message Error Code
Example A: Time Error Level JVM# Message Error Code Java
Multiline
Dump
of
stuff

lguinn2 · ‎04-06-2017

"Any guidance on approaching supporting conf file distribution and what systems need them? "
Take a look at this: Where do I configure my Splunk settings?

"is it better to individually extract a field using individual regular expressions or to pull out as many fields as I can in a single regex?"
It is probably a little more efficient to have fewer regexes. However, I believe it is more important to have field extractions that are easy to understand and maintain. So I often find that I extract multiple fields in a single regex, one after the other. It is okay to have multiple regexes that extract the same field like this:

EXTRACT-firstext=\]\s(?<err-level>\w+\sJVM:(?<jvm>\d+)\s(?<message>.*?)\s
EXTRACT-secondext=\]\s(?<err-level>\w+)\sJVM:(?<jvm>\d+)\sFactory Id(?<jvm_factory>.*?)\s(?<message>.*?)\s

Whichever regex matches the event will be used. You can also do more advanced regexes with optional sections, etc.

Have you looked at this app in Splunkbase? Log File Analysis for Oracle 11g
I strongly suggest that you download it onto a test server (or even your laptop) and examine the configuration files that it contains. You may be able to use these configs directly, or slightly modified. At the least, I think it will be educational...

Getting new data source in and parsed within a distributed deployment

Operationalizing TDIR: Building a More Resilient, Scalable SOC

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Are you a member of the Splunk Community?

Getting new data source in and parsed within a distributed deployment

Operationalizing TDIR: Building a More Resilient, Scalable SOC

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination