Forwarding logs with a long header and variable nu...

dgustaf · ‎02-05-2014

Greetings,

I am attempting to forward some collectl system usage logs from a cluster to Splunk. Ideally I would like Splunk to be able to understand the field names from the header. The logs have a long file header, most of which is useless, but the last line of the header gives the field names.

Different nodes in the cluster have different numbers of cpus, which leads to the logs from each node having different numbers of columns. Ideally Splunk would be able to determine from the header the number of fields (columns) and their names. I have attempted getting this to work using header regular expressions in props.conf, but so far I have not had luck.

A sample log excerpt is shown below:

    #cl1n110-20140201
################################### RECORDED ###################################
# Collectl:   V3.6.0-3  HiRes: 1  Options: -D  Subsys: bcdfijmnstxbcCdDfFijJmMnNtTxXZ
# DaemonOpts: -f /adm/accounting/collectl/raw/calhoun --procopts w -r00:01,7 -m -s+bcCdDfFijJmMnNtTxXZ --dskopts i -i 60 -F60 --procfilt u1000-2000000
################################################################################
# Collectl:   V3.6.0-3  HiRes: 1  Options: --from 20140131:00:09-20140201:21:13 -p /adm/accounting/collectl/raw/calhoun/cl1n110-20140201-000100.raw.gz --procanalyze -s bcCdDfFijJmMnNtTZ -P -oaz -f /adm/accounting/collectl/processed/calhoun 
# Host:       cl1n110  DaemonOpts: 
# Distro:     CentOS release 6.3 (Final)    Platform: AltixXE310
# Date:       20140201-000100  Secs: 1391234460 TZ: -0600
# SubSys:     bcCdDfFijJmMnNtTZ Options: az Interval: 60:60 NumCPUs: 8  NumBud: 3 Flags: i
# Filters:    NfsFilt:  EnvFilt: 
# HZ:         100  Arch: x86_64-linux-thread-multi PageSize: 4096
# Cpu:        GenuineIntel Speed(MHz): 2666.664 Cores: 4  Siblings: 4 Nodes: 1
# Kernel:     2.6.32-279.11.1.el6.x86_64  Memory: 16331460  Swap: 
# NumDisks:   1 DiskNames: sda
# NumNets:    5 NetNames: lo: eth0:1000 eth1:1000 ib0:20000 vlan8:
# IConnect:   NumHCAs: 1 PortStates:  IBVersion: ??? PQVersion: 1.5.12
# SCSI:       DA:0:00:00:00 CD:6:00:00:00
################################################################################
#Date Time [CPU:0]User% [CPU:0]Nice% [CPU:0]Sys% [CPU:0]Wait% [CPU:0]Irq% [CPU:0]Soft% [CPU:0]Steal% [CPU:0]Idle% [CPU:0]Totl% [CPU:0]Intrpt [CPU:1]User% [CPU:1]Nice% [CPU:1]Sys% [CPU:1]Wait% [CPU:1]Irq% [CPU:1]Soft% [CPU:1]Steal% [CPU:1]Idle% [CPU:1]Totl% [CPU:1]Intrpt [CPU:2]User% [CPU:2]Nice% [CPU:2]Sys% [CPU:2]Wait% [CPU:2]Irq% [CPU:2]Soft% [CPU:2]Steal% [CPU:2]Idle% [CPU:2]Totl% [CPU:2]Intrpt [CPU:3]User% [CPU:3]Nice% [CPU:3]Sys% [CPU:3]Wait% [CPU:3]Irq% [CPU:3]Soft% [CPU:3]Steal% [CPU:3]Idle% [CPU:3]Totl% [CPU:3]Intrpt [CPU:4]User% [CPU:4]Nice% [CPU:4]Sys% [CPU:4]Wait% [CPU:4]Irq% [CPU:4]Soft% [CPU:4]Steal% [CPU:4]Idle% [CPU:4]Totl% [CPU:4]Intrpt [CPU:5]User% [CPU:5]Nice% [CPU:5]Sys% [CPU:5]Wait% [CPU:5]Irq% [CPU:5]Soft% [CPU:5]Steal% [CPU:5]Idle% [CPU:5]Totl% [CPU:5]Intrpt [CPU:6]User% [CPU:6]Nice% [CPU:6]Sys% [CPU:6]Wait% [CPU:6]Irq% [CPU:6]Soft% [CPU:6]Steal% [CPU:6]Idle% [CPU:6]Totl% [CPU:6]Intrpt [CPU:7]User% [CPU:7]Nice% [CPU:7]Sys% [CPU:7]Wait% [CPU:7]Irq% [CPU:7]Soft% [CPU:7]Steal% [CPU:7]Idle% [CPU:7]Totl% [CPU:7]Intrpt
20140201 00:02:00 100 0 0 0 0 0 0 0 100 1803 96 0 4 0 0 0 0 0 100 1695 96 0 4 0 0 0 0 0 100 1687 96 0 4 0 0 0 0 0 100 1699 96 0 4 0 0 0 0 0 100 1679 96 0 4 0 0 0 0 0 100 1686 96 0 3 0 0 0 0 0 100 1698 96 0 4 0 0 0 0 0 100 1684
20140201 00:03:00 86 0 0 0 0 0 0 14 86 1461 84 0 3 0 0 0 0 13 87 1386 84 0 2 0 0 0 0 13 87 1390 84 0 3 0 0 0 0 13 87 1380 84 0 2 0 0 0 0 13 87 1378 84 0 2 0 0 0 0 13 87 1380 84 0 2 0 0 0 0 13 87 1384 84 0 2 0 0 0 0 13 87 1396

Most of the header is not useful, but the last line (beginning with #Date Time ...) lists the field names. There are different numbers of CPUs on different nodes, resulting in different numbers of log columns in the files from different nodes.

If anyone knows if such files could be easily read-in and parsed by Splunk any advice would be much appreciated.

ogdin · ‎02-13-2014

Try this:

http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime

In inputs.conf

[monitor:///your-path/filename] sourcetype=header-file

In props.conf

[header-file] FIELD_DELIMITER=space HEADER_FIELD_DELIMITER=space HEADER_FIELD_LINE_NUMBER=20 NO_BINARY_CHECK=1 SHOULD_LINEMERGE=false

Should dump the first 19 lines (the garbage) and use the header found in line 20. If the header is variable length, you can use other methods such as FIELD_HEADER_REGEX.

Note this will work on Forwarders and does the header/field mapping at index-time.

Forwarding logs with a long header and variable number of columns.

Splunk App for Anomaly Detection End of Life Announcment

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk

Are you a member of the Splunk Community?

Forwarding logs with a long header and variable number of columns.

Splunk App for Anomaly Detection End of Life Announcment

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk