Getting Data In

What are Best practices for writing field extractors?

eregon
Path Finder

Hello fellow Splunthusiasts!

I have some applications running on classic VMs, I am happily splunking their logs and everything works fine.

Recently we started to deploy the same applications to Docker containers. To collect logs, I use Docker's native Splunk logging driver and receive the data through HEC.

The logging driver adds its own stuff to app's log (either prepends a prefix or it wraps the app's log into JSON, the extra information identifies the container instance). Due to this, some of my field extractors stopped working, as the format of the data actually ingested has changed.

What are the best practices for writing extractors universally, so one configuration works with all ways of collecting logs?

Just a side note: the point is to have extractors in props.conf in an app distributed from DS, therefore my question is about what should be addressed in regular expression itself. Using | rex field=xxx is not an option here.

Labels (2)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Unfortunately I haven't direct answer to you as I haven't your logs to look what could be the best options.

Normally when I have some complicated regex needs I will use https://regex101.com/. It is really good way to test those with scrambled samples. And you could even share those cases to other people when you need help. Just save and then share that URL which it give to you.

On Splunk Slack there are separate chat group for solving this kind of challenges. You could try from it https://splunk-usergroups.slack.com/archives/C3WFE5V5G

r. Ismo

0 Karma

eregon
Path Finder

@isoutamo , thanks for your reply. Actually, the point of the question is not "how to craft a regex". Rather than that, I am asking:

  • how variable this extra data usually is?
  • is there any recommended format of this extra data that I could stick to while writing my regexes?
  • Is it even a good idea to try to presume any specific format of the added metadata?
  • Maybe it would be better to have more regexes: one for actual log data (where line-start must not be referred to using ^) and another for the extra metadata?

As you can see, the question is not application specific. The situation actually may occur even for applications covered by a TA from Spkunkbase.

Maybe I could rephrase like this: "how to migrate apps to Docker and its logging driver without breaking existing extractions".

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...