Hi, I've an eventhub that receives data from multiple application, with different number and values of columns.
The events are typically like so (as an example)
Environment ProductName UtcDate RequestId Clientid ClientIp #app1
Environment ProductName UtcDate Instance Region RequestId ClientIp DeviceId #app2
Environment ProductName UtcDate DeviceId ClientIp #app3
PROD Product1 2024-04-04T20:21:20 abcd-12345-dev bcde-ed-1234 10.12.13.14 #app1
PROD Product2 2024-04-04T20:23:20 gwa us 126d-a23d-1234-def1 10.23.45.67 abcAJHSSz12. #ap
TEST Product3 2024-04-04T20:25:20 Ghsdhg1245 12.34.57.78 #app3
Environment ProductName UtcDate Instance Region RequestId ClientIp DeviceId #app2
#app at end of line, is not part of log, just to annotate the different entrie
How can splunk automagically select which "format" to use with REPORT/EXTRACT in transforms?
On the HeavyForwarder
transforms.conf
[header1]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate, RequestId,Clientid,ClientIp
[header2]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate,Instance,Region,RequestId,ClientIp,DeviceId
[header3]
DELIMS="\t"
FIELDS=Environment,ProductName,UtcDate ,DeviceId ClientIp
In props.conf
[eventhub:sourcewithmixedsources]
INDEXED_EXTRACTIONS = TSV
CHECK_FOR_HEADER=true
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
pulldown_type = 1
REPORT-headers = header1, header3,header3
1. If you're doing indexed extractions, your data is processed as parsed. Adding search-time extractions will only result in double fields (or misassigned fields in case of not-well-defined formats).
2. In general, unless you have a file input with header specifying fields within that file there's no way to assign fields dynamically to indexed-extraction fields.
3. You could try making search-time extraction definitions that match only specific message templates.
Like
REPORT-fields-for-app1 = ^(?<Environment>\S+)\s+(?<ProductName>\S+)\s+\(?<UtcDate>\S+)\s+(<RequestId>\S+)\s+(?<ClientId>\S+)\s+(?<ClientIp>\d+\.\d+\.\d+\.\d+)$
This should match only data for app1 because it has specific number of whitespace-separated files and has IP value anchored in a particular place within an event. You can have several other similar extraction definitions, each covering separate event template.
Thanks for confirming my suspicion. SED'ed a lot!
1. If you're doing indexed extractions, your data is processed as parsed. Adding search-time extractions will only result in double fields (or misassigned fields in case of not-well-defined formats).
2. In general, unless you have a file input with header specifying fields within that file there's no way to assign fields dynamically to indexed-extraction fields.
3. You could try making search-time extraction definitions that match only specific message templates.
Like
REPORT-fields-for-app1 = ^(?<Environment>\S+)\s+(?<ProductName>\S+)\s+\(?<UtcDate>\S+)\s+(<RequestId>\S+)\s+(?<ClientId>\S+)\s+(?<ClientIp>\d+\.\d+\.\d+\.\d+)$
This should match only data for app1 because it has specific number of whitespace-separated files and has IP value anchored in a particular place within an event. You can have several other similar extraction definitions, each covering separate event template.