Hi all,
I want to know how splunk extracts fields from TA_windows inputs when mode=multikv
The _raw event does not seem to have any sort of field indicator (as compared to events from TA_nix which has headers)
As an example:
Splunk_TA_windows/local/inputs.conf
[perfmon://Network-Bytes]
disabled = false
counters = Bytes Total/sec; Bytes Received/sec; Bytes Sent/sec;
interval = 60
mode = multikv
index = perfmon
useEnglishOnly = true
object = Network Interface
sourcetype = PerfmonMk:Network
gives _raw events as seen indexed in Splunk:
vmxnet3_Ethernet_Adapter 19069.926362422757 11044.290764991998 8025.635597430761
vmxnet3_Ethernet_Adapter 26173.569591676503 15701.614528029395 10471.95506364711
vmxnet3_Ethernet_Adapter 28654.246470518276 17482.977608482255 11171.268862036022
From this output, splunk magically extracts fields like:
Bytes_Received/sec
Bytes_Sent/sec
Bytes_Total/sec
instance
category
collection
I checked the TA_windows configs and ran btool, but could not trace configs other than some standard PerfmonMk:<object> stanzas in Splunk_TA_windows/default/props.conf which contain only FIELDALIAS settings
What am I missing?
How does splunk know which field is which?
How does it even get values for category & collection when those values are not even present in the _raw?
Further comparison, TA_nix add-on does this in a much more legible manner (which can be easily understood and played around with) like:
Name rxPackets_PS txPackets_PS rxKB_PS txKB_PS
eth0 1024.00 1972.50 1415.04 674.94
Additional:
I will sum up what I learned about this from @Brett's session titled "PLA1163C - Perfecting Perfmon and Other Metrics" at .Conf 2023 below. .conf Online | .conf23 | Splunk
When mode is set to multikv, Splunk combines all of the counters and instances for an object into one event on disk. It does this to be more efficient and save disk space.
When searching, Splunk will automatically convert this one event on disk to separate events for each instance (it does this for events based on their PerfmonMk prefixed source or sourcetype.)
To test this out yourself, you can temporarily set the source and sourcetype for a Perfmon stanza in multikv mode to something like test, and Splunk will not conert it into separate events by instance. The screen shot below shows an example where I set the source and sourcetype to "test" on the UF (so Splunk would not break it up and you can actually see the headers in the single tab-separated event containing all of the instances in addition to the collection and category fields).
Thanks for the mention. This person definitely needs to watch my conf talk.
index time field extractions and mappings are all done in props and transforms.conf.
FYI there are no index time field extractions on mode=multikv
There are on mode=single but they are for metric store.
i see you have customized inputs :
this is thedefault one
## Network
[perfmon://Network]
counters = Bytes Total/sec; Packets/sec; Packets Received/sec; Packets Sent/sec; Current Bandwidth; Bytes Received/sec; Packets Received Unicast/sec; Packets Received Non-Unicast/sec; Packets Received Discarded; Packets Received Errors; Packets Received Unknown; Bytes Sent/sec; Packets Sent Unicast/sec; Packets Sent Non-Unicast/sec; Packets Outbound Discarded; Packets Outbound Errors; Output Queue Length; Offloaded Connections; TCP Active RSC Connections; TCP RSC Coalesced Packets/sec; TCP RSC Exceptions/sec; TCP RSC Average Packet Size
disabled = 1
instances = *
interval = 10
mode = multikv
object = Network Interface
useEnglishOnly=true
you have customized sourcetype. I will not do that, as there is a lot more working on standard sourcetype in a addon behind the scenes during indexing time.
@SinghK wrote:you have customized sourcetype. I will not do that, as there is a lot more working on standard sourcetype in a addon behind the scenes during indexing time.
i have explicitly mentioned the sourcetype to use in the input, but i have not customised the sourcetype definition
regardless, my question is specifically on the 'behind the scenes' processing that goes on for mode=multikv
please see my reply to @inventsekar
PS: if you have any standard best practices for defining TA_windows inputs, feel free to share them..
Hi @anirban_td .. i am not much sure of WindowsTA, but, one thing for sure - the windows logs are pretty much formatted.
so, headers may not be needed at all. hope you understand my view, thanks.
again, taking the example of TA_nix bandwidth.sh event:
Name rxPackets_PS txPackets_PS rxKB_PS txKB_PS
eth0 1024.00 1972.50 1415.04 674.94
one can easily recognize (and setup extraction mechanisms for) the fields because of the header row...
however, if the header row is not there, how do you do it?
-----------------------------------------------------------------------------------------------
i agree the multikv events are well formatted
but i still do not understand how splunk:
the only logical explanation that i can arrive at is: the header row (or something similar, which aids splunk in identifying the fields) is generated at the UF level; but once the event reaches the indexer tier, it is discarded after field extraction, to save license cost & disk space..
i want to know :
----------------------------------------------------------------------------
i am sure i am missing SOMETHING here..