Splunk Search

Trim field for operations

Builder

Hey everyone, I have an event type containing two fields that I need to trim. They're currently in this format:

02/09/11 at 2:56 PM","eth 0/1","NNNNNNNNNNNN","NNNNNNNNNNNN","72 ms","0","0 ms","0","4.200","G711U"

The fields I need trimmed are the "72 ms" and "0 ms". I just need the " ms" portion to be removed so I can start doing mathematical operations. The best would be removing this at index time, however I can also do it at search time. I tried using the field extractor but the regex it generates doesn't seem to make any sense.

This is what it generated: (?i)^(?:[^"]*"){9}(?P[^ ]+)

Tags (1)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

If this comes in CSV format, it might be best to use Splunk's wizardry to extract the fields early on:

http://www.splunk.com/base/Documentation/latest/Data/Extractfieldsfromfileheadersatindextime

If that's not an option (or so you can get up and running first), the following regex should do the trick, for the particular event you've pasted:

YourSearch | rex field=_raw "\"(?<Field1>\d*) ms\",\"\d*\",\"(?<Field2>\d*) ms" 
           | stats avg(Field1), avg(Field2)

You can take that regular expression and move it into props.conf as well:

[my_sourcetype]
EXTRACT-extract_two_fields = "(?<Field1>\d*) ms","\d*","(?<Field2>\d*) ms

(I don't believe you need to escape the quotes in props.conf -- you may need to put those escapes back in, if not).

Should you have Splunk pull out the field names (as per the first link), you'd end up with ResponseTime="72 ms" and Jitter="0 ms" (or whatever those represent). You could then pull out the numbers by doing:

YourSearch | rex field=ResponseTime "(?<NumResponseTime>\d*)" 
           | rex field=Jitter "(?<NumJitter>\d*)"

Let me know if that answers your question.

View solution in original post

Splunk Employee
Splunk Employee

If this comes in CSV format, it might be best to use Splunk's wizardry to extract the fields early on:

http://www.splunk.com/base/Documentation/latest/Data/Extractfieldsfromfileheadersatindextime

If that's not an option (or so you can get up and running first), the following regex should do the trick, for the particular event you've pasted:

YourSearch | rex field=_raw "\"(?<Field1>\d*) ms\",\"\d*\",\"(?<Field2>\d*) ms" 
           | stats avg(Field1), avg(Field2)

You can take that regular expression and move it into props.conf as well:

[my_sourcetype]
EXTRACT-extract_two_fields = "(?<Field1>\d*) ms","\d*","(?<Field2>\d*) ms

(I don't believe you need to escape the quotes in props.conf -- you may need to put those escapes back in, if not).

Should you have Splunk pull out the field names (as per the first link), you'd end up with ResponseTime="72 ms" and Jitter="0 ms" (or whatever those represent). You could then pull out the numbers by doing:

YourSearch | rex field=ResponseTime "(?<NumResponseTime>\d*)" 
           | rex field=Jitter "(?<NumJitter>\d*)"

Let me know if that answers your question.

View solution in original post

Splunk Employee
Splunk Employee

The Rex uses Perl Compatible Regular Expressions (PCRE).. at is at least very similar. You might get into issues where you have to escape things (such as a quote), but that you can generally figure out by starting with something simple, and building up with a little trial and error. The authoritative source for PCRE is http://perldoc.perl.org/perlre.html , and if you google PCRE tutorial, or Perl Regular Expressions Tutorial, you might find a few things that are a bit more "how do I get started." This one looks promising: http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

0 Karma

Builder

Do you happen to have a good tutorial for the rex syntax, specifically for trimming/removing extra characters? I'd very much appreciate it.

0 Karma

Splunk Employee
Splunk Employee

Excellent. I'm glad to hear that.

0 Karma

Builder

The insert into props.conf worked like a bell. Thanks!

0 Karma