Splunk Search
Highlighted

A list of common regular expressions for field extractions?

Contributor

Splunk isn't extracting certain fields from my logs. This includes basic things such as IP addresses.

It seems that I need to build regular expressions so that Splunk will recognize my data better. Here are some things which I need Splunk to recognize:

  1. 1.1.1.1 and 192.168.100.100 are IPv4 addresses. Regex is something like (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
  2. IPv6 addresses. The regex for this is difficult. Very difficult, which is why I was hoping that Splunk would do this for me, and save me time.
  3. 1.1.1.1:8080 is an IP address with a port
  4. foo@example.gov is an email address.

The examples above are extremely common. Is there a list of common regular expressions which I can import into Splunk so that I don't need to experiment with dozens of regular expression strings?

Tags (2)
0 Karma
Highlighted

Re: A list of common regular expressions for field extractions?

Splunk Employee
Splunk Employee

While there are plenty of regex sites that can provide these regexes, it isn't all that useful in most cases. A field extraction is usually defined by absolute position (e.g., 5rd word in the line) or its location relative to fixed characters (e.g., string after src_addr= until the next space, or string starting after <addr> until you see </addr>). So trying to force the regex to match the exact thing you're looking for is rarely necessary. Usually, once you have located it, it's sufficient to say "string of non-space characters" (\S*) or "sequence of hex digits and colons" ([0-9a-zA-Z\:]* or [[:xdigit:]:]). So typically, it's less important to know how to match or validate against the data type itself as much as to match to locate it within a log entry. This unfortunately is more dependent on your log format, and less likely to be found in the wild.

Highlighted

Re: A list of common regular expressions for field extractions?

Contributor

I was under the impression that fields are not position-based. e.g. If I want Splunk to identify an IPv6 field anywhere on the line, I need to use the interactive field extractor to define the IPv6 field based on a regular expression.

0 Karma