Getting Data In

Help with extracting and filtering header preambles

a212830
Champion

Hi,

We have an ugly custom log file, and we'd like to filter out the beginning of the file. We'd like to start from the first line, down to the first line with a valid timestamp. Is that possible?

Here's a sample:

Lots of lines like this:

Application Name:               Email_MMK_Node3_PR
Application Type:               EMAIL_SERVER
Application stuff....
....
Application Options: {
  { pop-client ['password' [output suppressed], 'move-failed-ews-item' [str] = "true", 'protocol-timeout' [str] = "00:05:00", 'maximum-msg-size' [str] = "5", 'type' [str] = "IMAP", 'address' [str] = "#", 'exchange-version' [str] = "Exchange2010_SP2", 'endpoint' [str] = "default", 'folder-path' [str] = "INBOX", 'leave-msg-on-server' [str] = "false", 'folder-separator' [str] = "/", 'port' [str] = "995", 'delete-bad-formatted-msg' [str] = "false", 'failed-items-folder-name' [str] = "", 'pop-connection-security' [str] = "none", 'enable-debug' [str] = "false", 'connect-timeout' [str] = "00:00:30", 'cycle-time' [str] = "00:00:30", 'mailbox' [str] = "#", 'enable-big-msg-stripping' [str] = "false", 'server' [str] = "imap.blah.blah.com", 'delete-big-msg' [str] = "false", 'enable-client' [str] = "false", 'allow-bad-msg-size' [str] = "false", 'maximum-msg-number' [str] = "500", ]}
  { pop-client-aaargprations ['password' [output suppressed], 'move-failed-ews-item' [str] = "true", 'protocol-timeout' [str] = "00:05:00", 'maximum-msg-size' [str] = "5", 'type' [str] = "IMAP", 'address' [str] = "anotherfield.com", 'exchange-version' [str] = "Exchange2010_SP2", 'endpoint' [str] = "blahblahIn_Endpoint", 'folder-path' [str] = "INBOX", 'leave-msg-on-server' [str] = "true", 'folder-separator' [str] = "/", 'port' [str] = "993", 'delete-bad-formatted-msg' [str] = "false", 'failed-items-folder-name' [str] = "failedItems", 'pop-connection-security' [str] = "ssl-tls", 'connect-timeout' [str] = "00:00:30", 'enable-debug' [str] = "false", 'cycle-time' [str] = "00:00:30", 'mailbox' [str] = "1234rkrgprations", 'enable-big-msg-stripping' [str] = "false", 'server' [str] = "blah.server.com", 'delete-big-msg' [str] = "false", 'enable-client' [str] = "false", 'allow-bad-msg-size' [str] = "false", 'maximum-msg-number' [str] = "500", ]}
}

23:11:14.972 Dbg 29999 [EmailServer] Configuring 'MESSAGE_SERVER' connection

Any suggestions?

0 Karma

somesoni2
Revered Legend

Try something like this

props.conf on Indexer/Heavy forwarder

[yoursourceytpe]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\d+\:\d+\:\d+
...other time format settings---
TRANSFORMS-removeheaderevent = setnull

transforms.conf on Indexer/Heavy forwarder

[setnull]
REGEX = ^\w+
DEST_KEY = queue
FORMAT = nullQueue

a212830
Champion

Can you elaborate? What is the regex doing?

0 Karma

somesoni2
Revered Legend

The props.conf is splitting your logs in the events, where events will start with timesamp (23:11:14.972 in above example). This will give one extra large events with all the header preamble text. The TRANSFORMS will just find that huge header event, which I assume start with some word and not with timestamp, and will drop that events. (see this for transforms usage http://docs.splunk.com/Documentation/Splunk/6.0.2/Forwarding/Routeandfilterdatad#Filter_event_data_a...)

0 Karma

a212830
Champion

Thanks. Interesting approach. Never considered it.

0 Karma

sloshburch
Ultra Champion

For anyone who stumbles on this in the future with similar questions, remember that you can tinker with the sourcetype definition with the Add Data wizard. The Advanced panel of the Set Source Type menu is where you can tinker and see how splunk would interpret the results.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!