Splunk Search

A list of common regular expressions for field extractions?

Contributor

Splunk isn't extracting certain fields from my logs. This includes basic things such as IP addresses.

It seems that I need to build regular expressions so that Splunk will recognize my data better. Here are some things which I need Splunk to recognize:

  1. 1.1.1.1 and 192.168.100.100 are IPv4 addresses. Regex is something like (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
  2. IPv6 addresses. The regex for this is difficult. Very difficult, which is why I was hoping that Splunk would do this for me, and save me time.
  3. 1.1.1.1:8080 is an IP address with a port
  4. foo@example.gov is an email address.

The examples above are extremely common. Is there a list of common regular expressions which I can import into Splunk so that I don't need to experiment with dozens of regular expression strings?

Tags (2)
0 Karma

Splunk Employee
Splunk Employee

While there are plenty of regex sites that can provide these regexes, it isn't all that useful in most cases. A field extraction is usually defined by absolute position (e.g., 5rd word in the line) or its location relative to fixed characters (e.g., string after src_addr= until the next space, or string starting after <addr> until you see </addr>). So trying to force the regex to match the exact thing you're looking for is rarely necessary. Usually, once you have located it, it's sufficient to say "string of non-space characters" (\S*) or "sequence of hex digits and colons" ([0-9a-zA-Z\:]* or [[:xdigit:]:]). So typically, it's less important to know how to match or validate against the data type itself as much as to match to locate it within a log entry. This unfortunately is more dependent on your log format, and less likely to be found in the wild.

Contributor

I was under the impression that fields are not position-based. e.g. If I want Splunk to identify an IPv6 field anywhere on the line, I need to use the interactive field extractor to define the IPv6 field based on a regular expression.

0 Karma