Getting Data In

What are Best practices for writing field extractors?

eregon
Path Finder

Hello fellow Splunthusiasts!

I have some applications running on classic VMs, I am happily splunking their logs and everything works fine.

Recently we started to deploy the same applications to Docker containers. To collect logs, I use Docker's native Splunk logging driver and receive the data through HEC.

The logging driver adds its own stuff to app's log (either prepends a prefix or it wraps the app's log into JSON, the extra information identifies the container instance). Due to this, some of my field extractors stopped working, as the format of the data actually ingested has changed.

What are the best practices for writing extractors universally, so one configuration works with all ways of collecting logs?

Just a side note: the point is to have extractors in props.conf in an app distributed from DS, therefore my question is about what should be addressed in regular expression itself. Using | rex field=xxx is not an option here.

Labels (2)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Unfortunately I haven't direct answer to you as I haven't your logs to look what could be the best options.

Normally when I have some complicated regex needs I will use https://regex101.com/. It is really good way to test those with scrambled samples. And you could even share those cases to other people when you need help. Just save and then share that URL which it give to you.

On Splunk Slack there are separate chat group for solving this kind of challenges. You could try from it https://splunk-usergroups.slack.com/archives/C3WFE5V5G

r. Ismo

0 Karma

eregon
Path Finder

@isoutamo , thanks for your reply. Actually, the point of the question is not "how to craft a regex". Rather than that, I am asking:

  • how variable this extra data usually is?
  • is there any recommended format of this extra data that I could stick to while writing my regexes?
  • Is it even a good idea to try to presume any specific format of the added metadata?
  • Maybe it would be better to have more regexes: one for actual log data (where line-start must not be referred to using ^) and another for the extra metadata?

As you can see, the question is not application specific. The situation actually may occur even for applications covered by a TA from Spkunkbase.

Maybe I could rephrase like this: "how to migrate apps to Docker and its logging driver without breaking existing extractions".

Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...