Splunk Search

Regex for multiline events

sansri7680
Path Finder

I have a file with multiline events. Though there is no structured data in the events, the events themselves can be identified by proper splits. Below is an example

Frame 1: 110 bytes on wire (880 bits), 110 bytes captured (880 bits)
WTAP_ENCAP: 1
Arrival Time: Dec 20, 2007 14:01:56.000165000 India Standard Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1198139516.000165000 seconds
[Time delta from previous captured frame: 0.000000000 seconds]
[Time delta from previous displayed frame: 0.000000000 seconds]
[Time since reference or first frame: 0.000000000 seconds]
Frame Number: 1
Frame Length: 110 bytes (880 bits)
Capture Length: 110 bytes (880 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ip:ospf]
Ethernet II, Src: Cisco_f7:97:c2 (00:1b:8f:f7:97:c2), Dst: IPv4mcast_00:00:05 (01:00:5e:00:00:05)
Destination: IPv4mcast_00:00:05 (01:00:5e:00:00:05)
Address: IPv4mcast_00:00:05 (01:00:5e:00:00:05)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
Source: Cisco_f7:97:c2 (00:1b:8f:f7:97:c2)
Address: Cisco_f7:97:c2 (00:1b:8f:f7:97:c2)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 10.112.28.1 (10.112.28.1), Dst: 224.0.0.5 (224.0.0.5)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
1100 00.. = Differentiated Services Codepoint: Class Selector 6 (0x30)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 96
Identification: 0xd719 (55065)
Flags: 0x00
0... .... = Reserved bit: Not set
.0.. .... = Don't fragment: Not set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 1
Protocol: OSPF IGP (89)
Header checksum: 0xdaf5 [correct]
[Good: True]
[Bad: False]
Source: 10.112.28.1 (10.112.28.1)
Destination: 224.0.0.5 (224.0.0.5)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Open Shortest Path First
OSPF Header
OSPF Version: 2
Message Type: Hello Packet (1)
Packet Length: 64
Source OSPF Router: 195.156.12.222 (195.156.12.222)
Area ID: 0.0.0.0 (Backbone)
Packet Checksum: 0x077d [correct]
Auth Type: Null
Auth Data (none)
OSPF Hello Packet
Network Mask: 255.255.255.128
Hello Interval: 10 seconds
Options: 0x12 (L, E)
0... .... = DN: DN-bit is NOT set
.0.. .... = O: O-bit is NOT set
..0. .... = DC: Demand Circuits are NOT supported
...1 .... = L: The packet contains LLS data block
.... 0... = NP: NSSA is NOT supported
.... .0.. = MC: NOT Multicast Capable
.... ..1. = E: External Routing Capability
.... ...0 = MT: NO Multi-Topology Routing
Router Priority: 10
Router Dead Interval: 40 seconds
Designated Router: 10.112.28.1
Backup Designated Router: 10.112.28.14
Active Neighbor: 10.112.29.19
Active Neighbor: 10.112.29.20
Active Neighbor: 10.112.29.66
Active Neighbor: 10.112.29.131
Active Neighbor: 10.112.29.254
OSPF LLS Data Block
Checksum: 0xfff6
LLS Data Length: 12 bytes
Extended options TLV
Type: 1
Length: 4
Options: 0x00000001 (LR)
.... .... .... .... .... .... .... ..0. = RS: Restart Signal (RS-bit) is NOT set
.... .... .... .... .... .... .... ...1 = LR: LSDB Resynchronization (LR-bit) is SET

Frame 2: 110 bytes on wire (880 bits), 110 bytes captured (880 bits)
WTAP_ENCAP: 1
Arrival Time: Dec 20, 2007 14:01:56.000173000 India Standard Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1198139516.000173000 seconds
[Time delta from previous captured frame: 0.000008000 seconds]
[Time delta from previous displayed frame: 0.000008000 seconds]
[Time since reference or first frame: 0.000008000 seconds]
Frame Number: 2
Frame Length: 110 bytes (880 bits)
Capture Length: 110 bytes (880 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ip:ospf]
Ethernet II, Src: Cisco_f7:97:c2 (00:1b:8f:f7:97:c2), Dst: IPv4mcast_00:00:05 (01:00:5e:00:00:05)
Destination: IPv4mcast_00:00:05 (01:00:5e:00:00:05)
Address: IPv4mcast_00:00:05 (01:00:5e:00:00:05)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
Source: Cisco_f7:97:c2 (00:1b:8f:f7:97:c2)
Address: Cisco_f7:97:c2 (00:1b:8f:f7:97:c2)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 10.112.28.1 (10.112.28.1), Dst: 224.0.0.5 (224.0.0.5)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
1100 00.. = Differentiated Services Codepoint: Class Selector 6 (0x30)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 96
Identification: 0xd719 (55065)
Flags: 0x00
0... .... = Reserved bit: Not set
.0.. .... = Don't fragment: Not set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 1
Protocol: OSPF IGP (89)
Header checksum: 0xdaf5 [correct]
[Good: True]
[Bad: False]
Source: 10.112.28.1 (10.112.28.1)
Destination: 224.0.0.5 (224.0.0.5)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Open Shortest Path First
OSPF Header
OSPF Version: 2
Message Type: Hello Packet (1)
Packet Length: 64
Source OSPF Router: 195.156.12.222 (195.156.12.222)
Area ID: 0.0.0.0 (Backbone)
Packet Checksum: 0x077d [correct]
Auth Type: Null
Auth Data (none)
OSPF Hello Packet
Network Mask: 255.255.255.128
Hello Interval: 10 seconds
Options: 0x12 (L, E)
0... .... = DN: DN-bit is NOT set
.0.. .... = O: O-bit is NOT set
..0. .... = DC: Demand Circuits are NOT supported
...1 .... = L: The packet contains LLS data block
.... 0... = NP: NSSA is NOT supported
.... .0.. = MC: NOT Multicast Capable
.... ..1. = E: External Routing Capability
.... ...0 = MT: NO Multi-Topology Routing
Router Priority: 10
Router Dead Interval: 40 seconds
Designated Router: 10.112.28.1
Backup Designated Router: 10.112.28.14
Active Neighbor: 10.112.29.19
Active Neighbor: 10.112.29.20
Active Neighbor: 10.112.29.66
Active Neighbor: 10.112.29.131
Active Neighbor: 10.112.29.254
OSPF LLS Data Block
Checksum: 0xfff6
LLS Data Length: 12 bytes
Extended options TLV
Type: 1
Length: 4
Options: 0x00000001 (LR)
.... .... .... .... .... .... .... ..0. = RS: Restart Signal (RS-bit) is NOT set
.... .... .... .... .... .... .... ...1 = LR: LSDB Resynchronization (LR-bit) is SET

Frame 3: 60 bytes on wire (480 bits), 60 bytes captured (480 bits)
WTAP_ENCAP: 1
Arrival Time: Dec 20, 2007 14:01:56.474107000 India Standard Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1198139516.474107000 seconds
[Time delta from previous captured frame: 0.473934000 seconds]
[Time delta from previous displayed frame: 0.473934000 seconds]
[Time since reference or first frame: 0.473942000 seconds]
Frame Number: 3
Frame Length: 60 bytes (480 bits)
Capture Length: 60 bytes (480 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:llc:stp]
IEEE 802.3 Ethernet
Destination: Spanning-tree-(for-bridges)_00 (01:80:c2:00:00:00)
Address: Spanning-tree-(for-bridges)_00 (01:80:c2:00:00:00)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
Source: Cisco_f7:97:8a (00:1b:8f:f7:97:8a)
Address: Cisco_f7:97:8a (00:1b:8f:f7:97:8a)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Length: 38
Padding: 0000000000000000
Logical-Link Control
DSAP: Spanning Tree BPDU (0x42)
IG Bit: Individual
SSAP: Spanning Tree BPDU (0x42)
CR Bit: Command
Control field: U, func=UI (0x03)
000. 00.. = Command: Unnumbered Information (0x00)
.... ..11 = Frame type: Unnumbered frame (0x03)
Spanning Tree Protocol
Protocol Identifier: Spanning Tree Protocol (0x0000)
Protocol Version Identifier: Spanning Tree (0)
BPDU Type: Configuration (0x00)
BPDU flags: 0x00
0... .... = Topology Change Acknowledgment: No
.... ...0 = Topology Change: No
Root Identifier: 32768 / 10 / 00:1b:8f:f7:97:80
Root Bridge Priority: 32768
Root Bridge System ID Extension: 10
Root Bridge System ID: 00:1b:8f:f7:97:80
Root Path Cost: 0
Bridge Identifier: 32768 / 10 / 00:1b:8f:f7:97:80
Bridge Priority: 32768
Bridge System ID Extension: 10
Bridge System ID: 00:1b:8f:f7:97:80
Port identifier: 0x800a
Message Age: 0
Max Age: 20
Hello Time: 2
Forward Delay: 15

Each event can be separated using the word Frame followed by a incremental number and a colon.

I tried the below regex in the props.conf file

[4G]
BREAK_ONLY_BEFORE = (?m)Frame ([0-9]+):
SHOULD_LINEMERGE = true

After doing the above, the events are not split up properly. The first 1500 events are appearing in a random manner and split at improper positions. But the events after 1500 events are split up properly. Can someone help me in finding what is wrong in the way I would have defined the regex

Tags (3)
0 Karma

Ayn
Legend

Event breaking takes place line by line, so you don't need to try use the (?m) modifier. In fact I'm not sure if modifiers are supported for these config options, so your problem may lie right there.

...but upon reading your question a second time, you say the events after the first 1500 are split up properly - did you change your event breaking configuration after the first 1500 events? Because then that's the explanation - event breaking takes place at index-time so any changes you will make will only affect newly indexed data, not data that is already in the index.

lguinn2
Legend

I would also look at setting the following in props.conf

MAX_EVENTS=10000
TRUNCATE=0

MAX_EVENTS is the maximum number of lines that Splunk allows in each event. (Not well-named). So perhaps the events are being truncated or split improperly due to this problem.

TRUNCATE specifies the maximum number of characters in an events. 0 means "don't truncate my events."

Ayn
Legend

So if I understand you correctly you want to trim trailing whitespace? That can be done, although I'm not sure why you'd want to. If you do want to do it though, have a look at the SEDCMD statement in props.conf.

0 Karma

sansri7680
Path Finder

Hi, Thanks for your response. I didn't give any modifiers after the first 1500 events. But I found the problem with special characters in the file. I did the below to resolve the problem. I just opened the file in a word processor and trimmed all trailing tabs and spaces which was correctly indexed by splunk. But that doesn't solve my exact problem. In my app, the file is a constantly growing file and it is not expected for someone to open it up in word processor and trim trailing spaces everytime. Is there any permanent fix for this

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...