Getting Data In

What are Best practices for writing field extractors?

eregon
Path Finder

Hello fellow Splunthusiasts!

I have some applications running on classic VMs, I am happily splunking their logs and everything works fine.

Recently we started to deploy the same applications to Docker containers. To collect logs, I use Docker's native Splunk logging driver and receive the data through HEC.

The logging driver adds its own stuff to app's log (either prepends a prefix or it wraps the app's log into JSON, the extra information identifies the container instance). Due to this, some of my field extractors stopped working, as the format of the data actually ingested has changed.

What are the best practices for writing extractors universally, so one configuration works with all ways of collecting logs?

Just a side note: the point is to have extractors in props.conf in an app distributed from DS, therefore my question is about what should be addressed in regular expression itself. Using | rex field=xxx is not an option here.

Labels (2)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Unfortunately I haven't direct answer to you as I haven't your logs to look what could be the best options.

Normally when I have some complicated regex needs I will use https://regex101.com/. It is really good way to test those with scrambled samples. And you could even share those cases to other people when you need help. Just save and then share that URL which it give to you.

On Splunk Slack there are separate chat group for solving this kind of challenges. You could try from it https://splunk-usergroups.slack.com/archives/C3WFE5V5G

r. Ismo

0 Karma

eregon
Path Finder

@isoutamo , thanks for your reply. Actually, the point of the question is not "how to craft a regex". Rather than that, I am asking:

  • how variable this extra data usually is?
  • is there any recommended format of this extra data that I could stick to while writing my regexes?
  • Is it even a good idea to try to presume any specific format of the added metadata?
  • Maybe it would be better to have more regexes: one for actual log data (where line-start must not be referred to using ^) and another for the extra metadata?

As you can see, the question is not application specific. The situation actually may occur even for applications covered by a TA from Spkunkbase.

Maybe I could rephrase like this: "how to migrate apps to Docker and its logging driver without breaking existing extractions".

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...