Getting Data In

Parsing TSV with variable header names

kristensens
Explorer

Hi, I've an eventhub that receives data from multiple application, with different number and values of columns. 

The events are typically like so (as an example)

Environment ProductName UtcDate   RequestId Clientid ClientIp #app1 
Environment ProductName UtcDate Instance Region RequestId ClientIp DeviceId #app2
Environment ProductName UtcDate  DeviceId ClientIp #app3
PROD Product1 2024-04-04T20:21:20 abcd-12345-dev bcde-ed-1234 10.12.13.14 #app1
PROD Product2 2024-04-04T20:23:20 gwa us 126d-a23d-1234-def1 10.23.45.67 abcAJHSSz12. #ap
TEST Product3 2024-04-04T20:25:20 Ghsdhg1245 12.34.57.78 #app3
Environment ProductName UtcDate Instance Region RequestId ClientIp DeviceId #app2

#app at end of line, is not part of log, just to annotate the different entrie
How can splunk automagically select which "format" to use with REPORT/EXTRACT in transforms?

On the HeavyForwarder 
transforms.conf

[header1]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate,  RequestId,Clientid,ClientIp

[header2]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate,Instance,Region,RequestId,ClientIp,DeviceId

[header3]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate ,DeviceId ClientIp

In props.conf

[eventhub:sourcewithmixedsources]
INDEXED_EXTRACTIONS = TSV
CHECK_FOR_HEADER=true
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
pulldown_type = 1
REPORT-headers = header1, header3,header3

 

Labels (1)
Tags (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

1. If you're doing indexed extractions, your data is processed as parsed. Adding search-time extractions will only result in double fields (or misassigned fields in case of not-well-defined formats).

2. In general, unless you have a file input with header specifying fields within that file there's no way to assign fields dynamically to indexed-extraction fields.

3. You could try making search-time extraction definitions that match only specific message templates.

Like

REPORT-fields-for-app1 = ^(?<Environment>\S+)\s+(?<ProductName>\S+)\s+\(?<UtcDate>\S+)\s+(<RequestId>\S+)\s+(?<ClientId>\S+)\s+(?<ClientIp>\d+\.\d+\.\d+\.\d+)$

This should match only data for app1 because it has specific number of whitespace-separated files and has IP value anchored in a particular place within an event. You can have several other similar extraction definitions, each covering separate event template.

View solution in original post

kristensens
Explorer

Thanks for confirming my suspicion. SED'ed a lot!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. If you're doing indexed extractions, your data is processed as parsed. Adding search-time extractions will only result in double fields (or misassigned fields in case of not-well-defined formats).

2. In general, unless you have a file input with header specifying fields within that file there's no way to assign fields dynamically to indexed-extraction fields.

3. You could try making search-time extraction definitions that match only specific message templates.

Like

REPORT-fields-for-app1 = ^(?<Environment>\S+)\s+(?<ProductName>\S+)\s+\(?<UtcDate>\S+)\s+(<RequestId>\S+)\s+(?<ClientId>\S+)\s+(?<ClientIp>\d+\.\d+\.\d+\.\d+)$

This should match only data for app1 because it has specific number of whitespace-separated files and has IP value anchored in a particular place within an event. You can have several other similar extraction definitions, each covering separate event template.

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...