Getting Data In

Using transforms.conf to change metadata format from key::value

wowbaggerHU
Engager

Hello everyone!


I would like to ask about the Splunk Heavy Forwarder Splunk-side config:
https://splunk.github.io/splunk-connect-for-syslog/main/sources/vendor/Splunk/heavyforwarder/

With those settings it will send the metadata in the format of key::value.
Is it possible to reconfigure it to send metadata key-value pairs with some other key-value separator instead of "::"?
If yes, how exactly?

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

OK. So this is not about Splunk's metadata format as much as rendering it for export.

I suppose you can tweak it a little.

The key part here is this transform

[metadata_meta]
SOURCE_KEY = _meta
REGEX = (?ims)(.*)
FORMAT = ~~~SM~~~$1~~~EM~~~$0 
DEST_KEY = _raw

It's being called as the first one (except for the one manipulating routing) and it exports whole _meta as-is.

So you need to change it to:

[sanitize_metadata]
INGEST_EVAL = escaped_meta=replace(_meta,"::","=")
[metadata_meta]
SOURCE_KEY = escaped_meta
REGEX = (?ims)(.*) FORMAT = ~~~SM~~~$1~~~EM~~~$0 DEST_KEY = _raw

And of course adjust props to call the sanitize_metadata first

TRANSFORMS-zza-syslog = syslog_canforward, sanitize_metadata, metadata_meta,  metadata_source, metadata_sourcetype, metadata_index, metadata_host, metadata_subsecond, metadata_time, syslog_prefix, syslog_drop_zero

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

No. Indexed fields are indexed as key::value dearch terms. That's by design.

wowbaggerHU
Engager

I don't want to change how fields are indexed.
I just want to reformat the metadata (to use different key-value separators) via the transforms.conf prior to being forwarded to syslog-ng.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Wait a moment. As far as I can read this - https://splunk.github.io/splunk-connect-for-syslog/main/sources/vendor/Splunk/heavyforwarder/ - the forwarded data will be formatted like

st="sourcetype" i="index"

and so on.

So where's the problem?

0 Karma

wowbaggerHU
Engager

What you are referring to is the syslog serialized data or SDATA (see RFC 5424) portion of the message. That consists of only 5 values (same as the Splunk JSON envelope's 5 top-level fields). And yes, those use the equals sign as a separator.

On the other hand the main part of the message will look like this:

~~~SM~~~env::env01~~~EM~~~11/29/2024 02:01:55 PM\nLogName=Security\nEventCode=4624\nEventType=0\nComputerName=DESKTOP-OOU0O6E\nSourceName=Microsoft Windows security auditing.\nType=Information\nRecordNumber=49513\nKeywords=Audit Success\nTaskCategory=Logon\nOpCode=Info\nMessage=An account was successfully logged on.\r\n\r\nSubject:\r\n\tSecurity ID:\t\tNT AUTHORITY\\SYSTEM\r\n\tAccount Name:\t\tDESKTOP-OOU0O6E$\r\n\tAccount Domain:\t\tWORKGROUP\r\n\tLogon ID:\t\t0x3E7\r\n\r\nLogon Information:\r\n\tLogon Type:\t\t5\r\n\tRestricted Admin Mode:\t-\r\n\tVirtual Account:\t\tNo\r\n\tElevated Token:\t\tYes\r\n\r\nImpersonation Level:\t\tImpersonation\r\n\r\nNew Logon:\r\n\tSecurity ID:\t\tNT AUTHORITY\\SYSTEM\r\n\tAccount Name:\t\tSYSTEM\r\n\tAccount Domain:\t\tNT AUTHORITY\r\n\tLogon ID:\t\t0x3E7\r\n\tLinked Logon ID:\t\t0x0\r\n\tNetwork Account Name:\t-\r\n\tNetwork Account Domain:\t-\r\n\tLogon GUID:\t\t{00000000-0000-0000-0000-000000000000}\r\n\r\nProcess Information:\r\n\tProcess ID:\t\t0x2d4\r\n\tProcess Name:\t\tC:\\Windows\\System32\\services.exe\r\n\r\nNetwork Information:\r\n\tWorkstation Name:\t-\r\n\tSource Network Address:\t-\r\n\tSource Port:\t\t-\r\n\r\nDetailed Authentication Information:\r\n\tLogon Process:\t\tAdvapi  \r\n\tAuthentication Package:\tNegotiate\r\n\tTransited Services:\t-\r\n\tPackage Name (NTLM only):\t-\r\n\tKey Length:\t\t0\r\n\r\nThis event is generated when a logon session is created. It is generated on the computer that was accessed.\r\n\r\nThe subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.\r\n\r\nThe logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).\r\n\r\nThe New Logon fields indicate the account for whom the new logon was created, i.e. the account that was logged on.\r\n\r\nThe network fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.\r\n\r\nThe impersonation level field indicates the extent to which a process in the logon session can impersonate.\r\n\r\nThe authentication information fields provide detailed information about this specific logon request.\r\n\t- Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.\r\n\t- Transited services indicate which intermediate services have participated in this logon request.\r\n\t- Package name indicates which sub-protocol was used among the NTLM protocols.\r\n\t- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.

I would like to have the first part of the syslog message to have the metadata as env=env01 or env:env01.
As I understand the SC4S derived config allows you to modify most parts of the message. But is it possible for the metadata part too? If yes, how do I match to the metadata key-value pairs?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. So this is not about Splunk's metadata format as much as rendering it for export.

I suppose you can tweak it a little.

The key part here is this transform

[metadata_meta]
SOURCE_KEY = _meta
REGEX = (?ims)(.*)
FORMAT = ~~~SM~~~$1~~~EM~~~$0 
DEST_KEY = _raw

It's being called as the first one (except for the one manipulating routing) and it exports whole _meta as-is.

So you need to change it to:

[sanitize_metadata]
INGEST_EVAL = escaped_meta=replace(_meta,"::","=")
[metadata_meta]
SOURCE_KEY = escaped_meta
REGEX = (?ims)(.*) FORMAT = ~~~SM~~~$1~~~EM~~~$0 DEST_KEY = _raw

And of course adjust props to call the sanitize_metadata first

TRANSFORMS-zza-syslog = syslog_canforward, sanitize_metadata, metadata_meta,  metadata_source, metadata_sourcetype, metadata_index, metadata_host, metadata_subsecond, metadata_time, syslog_prefix, syslog_drop_zero

isoutamo
SplunkTrust
SplunkTrust
When you are replacing :: in _meta fields then receiving splunk instance don't recognize it any more as _meta data. And if there is no those mandatory meta fields then splunk cannot guess those and do what is needed for those events. Then based on receiver side configuration this data goes to default index or it will dropped.
0 Karma

wowbaggerHU
Engager

I am forwarding the logs from the Splunk HF to a syslog-ng instance, that I configured myself so it doesn't matter here.

0 Karma

wowbaggerHU
Engager

I checked it, but unfortunately it does not seem to work.
Now I can't seem to find logs that contain any metadata, so I assume they are being dropped due to some problem.

Where should I look for clues?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I'm assuming you're receiving this on SC4S. So as you've changed the format of sent data, the receiving end probably doesn't know what to do with that.

First thing to check would be to sniff the traffic to see whether the data is being sent and what it looks like.

wowbaggerHU
Engager

I have played around a bit more...

This is what seems to be working for me:

[sanitize_metadata]
EVAL-_meta=replace(_meta,"::","=")

[metadata_meta]
SOURCE_KEY = _meta
REGEX = (?ims)(.*)
FORMAT = $1__-__$0 
DEST_KEY = _raw

Note: __-__ is just a placeholder for a separator.

I found an article that is aiming at a marginally similar thing as I do:
https://zchandikaz.medium.com/alter-splunk-data-at-indexing-time-a10c09713f51

There, the individual uses EVAL instead of INGEST_EVAL. Is there any significant difference?

Also, I changed your example because it worked differently if I did not use _meta as a target variable in the INGEST_EVAL.
I noticed that with your version, the logs that originated from the Windows machine with the UF on it, were missing the metadata assigned there. When I use my version, all the metadata set on the UF (static key-value pairs) is there in the log.
Any idea why that might be?

Either way, thanks so much for your effort to help me! I really appreciate it!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

EVAL is a search-time configuration so it will not (I'm not eve  sure it's correct syntax in your example) work in index time.

 

0 Karma

wowbaggerHU
Engager

Okay, I reverted to using INGEST_EVAL, that works as well.

On the other hand, I have an additional question:
If a given Splunk node is already forwarding logs to another node over S2S or S2S over HEC, and I want to add this configuration to send the logs to yet another destination (a node running syslog-ng), then will this configuration break the other pre-existing destinations' log format? Or is it safe to use from this perspective?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It depends on your overall process but as a general rule, the pipeline works like this:

input -> transforms -> output(s)

So if you modify an event and its metadata it will get to outputs that way. There is an ugly way to avoid it - use CLONE_SOURCETYPE to make a copy of your event and process it independently but it's both a performance hit and a maintenance nightmare in the future.

0 Karma

wowbaggerHU
Engager

Thanks for clarifying. You helped a lot!
That means there are two options for me:

  • do this conversion on the syslog-ng side and that won't hurt the splunk side of things
  • forward the logs to yet another splunk instance that will only do this conversion, thereby isolating the "production" Splunk instance from these transforms
0 Karma

PickleRick
SplunkTrust
SplunkTrust

As I said before - while fiddling with multiple different intermediate syslog solutions can work if syslog is the native form of sending events, mixing it with windows usually ends badly one way or another. You either get your data broken and non-parseable or have to bend over backwards to force it to work at least half-decently.

We don't know your whole environment and what is the original problem you're trying to solve so can't help much here. There is of course additional question whether you need the splunk metadata at all if you're forwarding to a third party syslog receiver anyway.

0 Karma

wowbaggerHU
Engager

Thanks for your help with this.
In the meantime I've run into another problem. Could you please help me?
This is the topic: https://community.splunk.com/t5/Getting-Data-In/conditional-whitespace-in-transform/m-p/708831

0 Karma

wowbaggerHU
Engager

Thanks! That is understandable.
Based on your answers so far, I will think through what would work best, and will get back to you.
But either way, I think I got all the answers I needed.

0 Karma

wowbaggerHU
Engager

I can confirm, this type of setup does not work for the Windows logs:

 

[sanitize_metadata]
EVAL-EEEE =replace(_meta,"::","=")

[metadata_meta]
SOURCE_KEY = EEEE
REGEX = (?ims)(.*)
FORMAT = $1__-__$0 
DEST_KEY = _raw

The problem is that with this the Windows logs only contain the eventlog message part, as if they did not have any metadata attached.

 

0 Karma

wowbaggerHU
Engager

No, it's a custom configured syslog-ng instance. that I set up.

After looking at the logs arriving, I saw that the logs that previously had the metadata part included, now have nothing instead and the separators (~~~EM~~~ and ~~~SM~~~) are missing too.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

I'm not sure if I understand your requirements correctly? You want to reformat syslog feed before it has modified by HF? Or you want use some other metadata separator than :: ?

You could modify the data if you want before HF set it into metadata (and indexed fields).

BUT you cannot use your own metadata separator like =. In Splunk :: is fixed metadata separator and you must use it in transforms.conf and/or inputs.conf like _meta foo::bar 

r. Ismo

0 Karma
Get Updates on the Splunk Community!

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...