Getting Data In

Forwarding logs with a long header and variable number of columns.

dgustaf
New Member

Greetings,

I am attempting to forward some collectl system usage logs from a cluster to Splunk. Ideally I would like Splunk to be able to understand the field names from the header. The logs have a long file header, most of which is useless, but the last line of the header gives the field names.

Different nodes in the cluster have different numbers of cpus, which leads to the logs from each node having different numbers of columns. Ideally Splunk would be able to determine from the header the number of fields (columns) and their names. I have attempted getting this to work using header regular expressions in props.conf, but so far I have not had luck.

A sample log excerpt is shown below:

    #cl1n110-20140201
################################### RECORDED ###################################
# Collectl:   V3.6.0-3  HiRes: 1  Options: -D  Subsys: bcdfijmnstxbcCdDfFijJmMnNtTxXZ
# DaemonOpts: -f /adm/accounting/collectl/raw/calhoun --procopts w -r00:01,7 -m -s+bcCdDfFijJmMnNtTxXZ --dskopts i -i 60 -F60 --procfilt u1000-2000000
################################################################################
# Collectl:   V3.6.0-3  HiRes: 1  Options: --from 20140131:00:09-20140201:21:13 -p /adm/accounting/collectl/raw/calhoun/cl1n110-20140201-000100.raw.gz --procanalyze -s bcCdDfFijJmMnNtTZ -P -oaz -f /adm/accounting/collectl/processed/calhoun 
# Host:       cl1n110  DaemonOpts: 
# Distro:     CentOS release 6.3 (Final)    Platform: AltixXE310
# Date:       20140201-000100  Secs: 1391234460 TZ: -0600
# SubSys:     bcCdDfFijJmMnNtTZ Options: az Interval: 60:60 NumCPUs: 8  NumBud: 3 Flags: i
# Filters:    NfsFilt:  EnvFilt: 
# HZ:         100  Arch: x86_64-linux-thread-multi PageSize: 4096
# Cpu:        GenuineIntel Speed(MHz): 2666.664 Cores: 4  Siblings: 4 Nodes: 1
# Kernel:     2.6.32-279.11.1.el6.x86_64  Memory: 16331460  Swap: 
# NumDisks:   1 DiskNames: sda
# NumNets:    5 NetNames: lo: eth0:1000 eth1:1000 ib0:20000 vlan8:
# IConnect:   NumHCAs: 1 PortStates:  IBVersion: ??? PQVersion: 1.5.12
# SCSI:       DA:0:00:00:00 CD:6:00:00:00
################################################################################
#Date Time [CPU:0]User% [CPU:0]Nice% [CPU:0]Sys% [CPU:0]Wait% [CPU:0]Irq% [CPU:0]Soft% [CPU:0]Steal% [CPU:0]Idle% [CPU:0]Totl% [CPU:0]Intrpt [CPU:1]User% [CPU:1]Nice% [CPU:1]Sys% [CPU:1]Wait% [CPU:1]Irq% [CPU:1]Soft% [CPU:1]Steal% [CPU:1]Idle% [CPU:1]Totl% [CPU:1]Intrpt [CPU:2]User% [CPU:2]Nice% [CPU:2]Sys% [CPU:2]Wait% [CPU:2]Irq% [CPU:2]Soft% [CPU:2]Steal% [CPU:2]Idle% [CPU:2]Totl% [CPU:2]Intrpt [CPU:3]User% [CPU:3]Nice% [CPU:3]Sys% [CPU:3]Wait% [CPU:3]Irq% [CPU:3]Soft% [CPU:3]Steal% [CPU:3]Idle% [CPU:3]Totl% [CPU:3]Intrpt [CPU:4]User% [CPU:4]Nice% [CPU:4]Sys% [CPU:4]Wait% [CPU:4]Irq% [CPU:4]Soft% [CPU:4]Steal% [CPU:4]Idle% [CPU:4]Totl% [CPU:4]Intrpt [CPU:5]User% [CPU:5]Nice% [CPU:5]Sys% [CPU:5]Wait% [CPU:5]Irq% [CPU:5]Soft% [CPU:5]Steal% [CPU:5]Idle% [CPU:5]Totl% [CPU:5]Intrpt [CPU:6]User% [CPU:6]Nice% [CPU:6]Sys% [CPU:6]Wait% [CPU:6]Irq% [CPU:6]Soft% [CPU:6]Steal% [CPU:6]Idle% [CPU:6]Totl% [CPU:6]Intrpt [CPU:7]User% [CPU:7]Nice% [CPU:7]Sys% [CPU:7]Wait% [CPU:7]Irq% [CPU:7]Soft% [CPU:7]Steal% [CPU:7]Idle% [CPU:7]Totl% [CPU:7]Intrpt
20140201 00:02:00 100 0 0 0 0 0 0 0 100 1803 96 0 4 0 0 0 0 0 100 1695 96 0 4 0 0 0 0 0 100 1687 96 0 4 0 0 0 0 0 100 1699 96 0 4 0 0 0 0 0 100 1679 96 0 4 0 0 0 0 0 100 1686 96 0 3 0 0 0 0 0 100 1698 96 0 4 0 0 0 0 0 100 1684
20140201 00:03:00 86 0 0 0 0 0 0 14 86 1461 84 0 3 0 0 0 0 13 87 1386 84 0 2 0 0 0 0 13 87 1390 84 0 3 0 0 0 0 13 87 1380 84 0 2 0 0 0 0 13 87 1378 84 0 2 0 0 0 0 13 87 1380 84 0 2 0 0 0 0 13 87 1384 84 0 2 0 0 0 0 13 87 1396

Most of the header is not useful, but the last line (beginning with #Date Time ...) lists the field names. There are different numbers of CPUs on different nodes, resulting in different numbers of log columns in the files from different nodes.

If anyone knows if such files could be easily read-in and parsed by Splunk any advice would be much appreciated.

Tags (2)
0 Karma

ogdin
Splunk Employee
Splunk Employee

Try this:

http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime

In inputs.conf


[monitor:///your-path/filename]
sourcetype=header-file

In props.conf

[header-file]
FIELD_DELIMITER=space
HEADER_FIELD_DELIMITER=space
HEADER_FIELD_LINE_NUMBER=20
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false

Should dump the first 19 lines (the garbage) and use the header found in line 20. If the header is variable length, you can use other methods such as FIELD_HEADER_REGEX.

Note this will work on Forwarders and does the header/field mapping at index-time.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...