Getting Data In

Universal Forwarder and multiline events containing timestamps

gchappel
Observer

Background

I have a very legacy application with bad/inconsistent log formatting, and I want to be able to somehow collect this in Splunk via Universal Forwarder. The issue is with multiple line events, which dump XML documents containing separate timestamps into log messages.

Issue

Because these multiline messages contain a timestamp within the body of the XML, and this becomes part of the body of the log message, Splunk is inserting events with "impossible" timestamps. For example an event will get indexed as happening in 2019 when this is actually a log event from 2024, which output an XML body containing an <example></example>  element which contains a 2019 timestamp, and part of this body is stored as a Splunk event from 5 years ago.

Constraints

  • I cannot modify the configuration of the Splunk indexer/search head/anything other than the Universal Forwarder that I control
  • I do not have access to licensing to be able to run any Heavy Forwarders; I can only go from Universal Forwarder on hosts which I control directly to a HTTP Event Collector endpoint that I do not control
  • I cannot (easily) change the log format to not dump these bodies. There is a long term ask on the team to fix up logging to be a) consistent and b) more ingest-friendly - but I'm looking for any interim solution that I can apply on the component I control directly, which is basically the Universal Forwarder only.

Ideas?

My only idea so far is a custom sourcetype which specifies the log timestamp format exactly including a regex anchor to the start of the line, and also reduces/removes the MAX_TIMESTAMP_LOOKAHEAD value to stop Splunk from looking past the first match - I believe this would mean that all the lines in an event would be considered correctly because the XML document would start with either whitespace or a < character. However my understanding is that this would require a change either to the indexer or to a Heavy Forwarder which I can't do.

I'm looking for any alternatives this community can offer as a potential workaround until the log sanitization effort gets off the ground.

Labels (2)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

This data is not being onboarded properly.  That may be your fault or someone else's, but you need to work with the owner of the HF to install a better set of props.conf settings so the data is onboarded correctly.

Focus on the Great Eight settings, with particular attention to LINE_BREAKER, TIME_PREFIX, and TIME_FORMAT.

If the HF owner pushes back, remind him/her that Splunk suffers when data is not onboarded well.  Additionally, the company may suffer if data cannot be searched because the timestamps are wrong.

---
If this reply helps you, Karma would be appreciated.
0 Karma

PickleRick
SplunkTrust
SplunkTrust

UF does not do parsing. Except for indexed extractions or when you set force_local_processing=true. So unless you turn your UF into a kind of a poor-man's-HF, your parsing and time extraction settings will not work on UF.

If you have access to HEC endpoint though you could consider using another method (like third-party solution like filebeat or even your own python script to pre-parse those events a bit and send them via HTTP.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...