Getting Data In

Parsing TSV with variable header names

kristensens
Explorer

Hi, I've an eventhub that receives data from multiple application, with different number and values of columns. 

The events are typically like so (as an example)

Environment ProductName UtcDate   RequestId Clientid ClientIp #app1 
Environment ProductName UtcDate Instance Region RequestId ClientIp DeviceId #app2
Environment ProductName UtcDate  DeviceId ClientIp #app3
PROD Product1 2024-04-04T20:21:20 abcd-12345-dev bcde-ed-1234 10.12.13.14 #app1
PROD Product2 2024-04-04T20:23:20 gwa us 126d-a23d-1234-def1 10.23.45.67 abcAJHSSz12. #ap
TEST Product3 2024-04-04T20:25:20 Ghsdhg1245 12.34.57.78 #app3
Environment ProductName UtcDate Instance Region RequestId ClientIp DeviceId #app2

#app at end of line, is not part of log, just to annotate the different entrie
How can splunk automagically select which "format" to use with REPORT/EXTRACT in transforms?

On the HeavyForwarder 
transforms.conf

[header1]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate,  RequestId,Clientid,ClientIp

[header2]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate,Instance,Region,RequestId,ClientIp,DeviceId

[header3]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate ,DeviceId ClientIp

In props.conf

[eventhub:sourcewithmixedsources]
INDEXED_EXTRACTIONS = TSV
CHECK_FOR_HEADER=true
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
pulldown_type = 1
REPORT-headers = header1, header3,header3

 

Labels (1)
Tags (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

1. If you're doing indexed extractions, your data is processed as parsed. Adding search-time extractions will only result in double fields (or misassigned fields in case of not-well-defined formats).

2. In general, unless you have a file input with header specifying fields within that file there's no way to assign fields dynamically to indexed-extraction fields.

3. You could try making search-time extraction definitions that match only specific message templates.

Like

REPORT-fields-for-app1 = ^(?<Environment>\S+)\s+(?<ProductName>\S+)\s+\(?<UtcDate>\S+)\s+(<RequestId>\S+)\s+(?<ClientId>\S+)\s+(?<ClientIp>\d+\.\d+\.\d+\.\d+)$

This should match only data for app1 because it has specific number of whitespace-separated files and has IP value anchored in a particular place within an event. You can have several other similar extraction definitions, each covering separate event template.

View solution in original post

kristensens
Explorer

Thanks for confirming my suspicion. SED'ed a lot!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. If you're doing indexed extractions, your data is processed as parsed. Adding search-time extractions will only result in double fields (or misassigned fields in case of not-well-defined formats).

2. In general, unless you have a file input with header specifying fields within that file there's no way to assign fields dynamically to indexed-extraction fields.

3. You could try making search-time extraction definitions that match only specific message templates.

Like

REPORT-fields-for-app1 = ^(?<Environment>\S+)\s+(?<ProductName>\S+)\s+\(?<UtcDate>\S+)\s+(<RequestId>\S+)\s+(?<ClientId>\S+)\s+(?<ClientIp>\d+\.\d+\.\d+\.\d+)$

This should match only data for app1 because it has specific number of whitespace-separated files and has IP value anchored in a particular place within an event. You can have several other similar extraction definitions, each covering separate event template.

Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...