Greetings,
I am attempting to forward some collectl system usage logs from a cluster to Splunk. Ideally I would like Splunk to be able to understand the field names from the header. The logs have a long file header, most of which is useless, but the last line of the header gives the field names.
Different nodes in the cluster have different numbers of cpus, which leads to the logs from each node having different numbers of columns. Ideally Splunk would be able to determine from the header the number of fields (columns) and their names. I have attempted getting this to work using header regular expressions in props.conf, but so far I have not had luck.
A sample log excerpt is shown below:
#cl1n110-20140201
################################### RECORDED ###################################
# Collectl: V3.6.0-3 HiRes: 1 Options: -D Subsys: bcdfijmnstxbcCdDfFijJmMnNtTxXZ
# DaemonOpts: -f /adm/accounting/collectl/raw/calhoun --procopts w -r00:01,7 -m -s+bcCdDfFijJmMnNtTxXZ --dskopts i -i 60 -F60 --procfilt u1000-2000000
################################################################################
# Collectl: V3.6.0-3 HiRes: 1 Options: --from 20140131:00:09-20140201:21:13 -p /adm/accounting/collectl/raw/calhoun/cl1n110-20140201-000100.raw.gz --procanalyze -s bcCdDfFijJmMnNtTZ -P -oaz -f /adm/accounting/collectl/processed/calhoun
# Host: cl1n110 DaemonOpts:
# Distro: CentOS release 6.3 (Final) Platform: AltixXE310
# Date: 20140201-000100 Secs: 1391234460 TZ: -0600
# SubSys: bcCdDfFijJmMnNtTZ Options: az Interval: 60:60 NumCPUs: 8 NumBud: 3 Flags: i
# Filters: NfsFilt: EnvFilt:
# HZ: 100 Arch: x86_64-linux-thread-multi PageSize: 4096
# Cpu: GenuineIntel Speed(MHz): 2666.664 Cores: 4 Siblings: 4 Nodes: 1
# Kernel: 2.6.32-279.11.1.el6.x86_64 Memory: 16331460 Swap:
# NumDisks: 1 DiskNames: sda
# NumNets: 5 NetNames: lo: eth0:1000 eth1:1000 ib0:20000 vlan8:
# IConnect: NumHCAs: 1 PortStates: IBVersion: ??? PQVersion: 1.5.12
# SCSI: DA:0:00:00:00 CD:6:00:00:00
################################################################################
#Date Time [CPU:0]User% [CPU:0]Nice% [CPU:0]Sys% [CPU:0]Wait% [CPU:0]Irq% [CPU:0]Soft% [CPU:0]Steal% [CPU:0]Idle% [CPU:0]Totl% [CPU:0]Intrpt [CPU:1]User% [CPU:1]Nice% [CPU:1]Sys% [CPU:1]Wait% [CPU:1]Irq% [CPU:1]Soft% [CPU:1]Steal% [CPU:1]Idle% [CPU:1]Totl% [CPU:1]Intrpt [CPU:2]User% [CPU:2]Nice% [CPU:2]Sys% [CPU:2]Wait% [CPU:2]Irq% [CPU:2]Soft% [CPU:2]Steal% [CPU:2]Idle% [CPU:2]Totl% [CPU:2]Intrpt [CPU:3]User% [CPU:3]Nice% [CPU:3]Sys% [CPU:3]Wait% [CPU:3]Irq% [CPU:3]Soft% [CPU:3]Steal% [CPU:3]Idle% [CPU:3]Totl% [CPU:3]Intrpt [CPU:4]User% [CPU:4]Nice% [CPU:4]Sys% [CPU:4]Wait% [CPU:4]Irq% [CPU:4]Soft% [CPU:4]Steal% [CPU:4]Idle% [CPU:4]Totl% [CPU:4]Intrpt [CPU:5]User% [CPU:5]Nice% [CPU:5]Sys% [CPU:5]Wait% [CPU:5]Irq% [CPU:5]Soft% [CPU:5]Steal% [CPU:5]Idle% [CPU:5]Totl% [CPU:5]Intrpt [CPU:6]User% [CPU:6]Nice% [CPU:6]Sys% [CPU:6]Wait% [CPU:6]Irq% [CPU:6]Soft% [CPU:6]Steal% [CPU:6]Idle% [CPU:6]Totl% [CPU:6]Intrpt [CPU:7]User% [CPU:7]Nice% [CPU:7]Sys% [CPU:7]Wait% [CPU:7]Irq% [CPU:7]Soft% [CPU:7]Steal% [CPU:7]Idle% [CPU:7]Totl% [CPU:7]Intrpt
20140201 00:02:00 100 0 0 0 0 0 0 0 100 1803 96 0 4 0 0 0 0 0 100 1695 96 0 4 0 0 0 0 0 100 1687 96 0 4 0 0 0 0 0 100 1699 96 0 4 0 0 0 0 0 100 1679 96 0 4 0 0 0 0 0 100 1686 96 0 3 0 0 0 0 0 100 1698 96 0 4 0 0 0 0 0 100 1684
20140201 00:03:00 86 0 0 0 0 0 0 14 86 1461 84 0 3 0 0 0 0 13 87 1386 84 0 2 0 0 0 0 13 87 1390 84 0 3 0 0 0 0 13 87 1380 84 0 2 0 0 0 0 13 87 1378 84 0 2 0 0 0 0 13 87 1380 84 0 2 0 0 0 0 13 87 1384 84 0 2 0 0 0 0 13 87 1396
Most of the header is not useful, but the last line (beginning with #Date Time ...) lists the field names. There are different numbers of CPUs on different nodes, resulting in different numbers of log columns in the files from different nodes.
If anyone knows if such files could be easily read-in and parsed by Splunk any advice would be much appreciated.
Try this:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime
In inputs.conf
[monitor:///your-path/filename]
sourcetype=header-file
In props.conf
[header-file]
FIELD_DELIMITER=space
HEADER_FIELD_DELIMITER=space
HEADER_FIELD_LINE_NUMBER=20
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
Should dump the first 19 lines (the garbage) and use the header found in line 20. If the header is variable length, you can use other methods such as FIELD_HEADER_REGEX.
Note this will work on Forwarders and does the header/field mapping at index-time.