Splunk Search

Field extraction at index, or use transforms?

Explorer

I am working with the following input and wanted some advice on how/where to specify the field extractions:

"\x00\x00\x00103700079  C9E840    13372786523      7137                210018  51730064  #850 1      000         "

I have documentation from the vendor specifying value lengths and definitions and we can perform most field extractions via individial regex field extractions, but we wanted to know if there is a better or more effecient method recommended.

For regerence, the field mapping table is listed below and have included samples for a couple of the current field extractions.

1-2 Time of day-hours
3-4 Time of day-minutes
5    Duration-hours
6-7  Duration-minutes
8   Duration-tenths of minutes
9   Condition code
10-13    Access code dialed
14-17    Access code used
18-32    Dialed number
33-42    Calling number
43-57    Account code
58-64    Authorization code
65-66    Space
67  FRL
68-70   Incoming circuit ID (hundreds, tens, units)
71-73    Outgoing circuit ID (hundreds, tens, units)
74 Feature flag
75-76 Attendant console
77-80 Incoming TAC
81-82 Node number
83-85 INS
86-88 IXC
89 BCC
90 MA-UUI
91 Resource flag
92-95 Packet count
96 TSC flag
97-100 Reserved
101 Carriage return (Not displayed)
102 Line feed (Not displayed)
103-105 Null (displayed as “\x00\x00\x00” at beginning of new line)

For example, to extract the duration hours, minutes, tenths of minutes we use the following regex:

"^.{16}(?<duration_hours>\d{1})" 
"^.{17}(?<duration_minutes>\d{2})" 
"^.{19}(?<duration_tenths_minutes>\d{1})" 
0 Karma
2 Solutions

Influencer

A single regular expression is IMO the most efficient way to extract the fields here. To get rid of the \x00 values in your events, you could adjust the LINE_BREAKER settings of your sourcetype:

props.conf:

[<your sourcetype>]
LINE_BREAKER=([\x00\r\n]+)
EXTRACT-fields=<the regex here>

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Most efficient would probably be a single search time REGEX extraction:

EXTRACT-fields = (?<hour>.{2})(?<min>.{2})(?<duration_h>.)(?<duration_m>.{2})(?<duration_mtenths>.{8})(?<cc>.)(?<accesscd_dialed>.{4})(?<accesscd_used>.{4})(?<num_dialed>.{15})(?<num_calling.{10})

And so on. That way, all fields come in in a single pass over the data. Note that with this particular data, you may run into some problems searching for particular fields by a specific value (if the value is pressed right up against adjacent fields with no white space). You can deal with those for selected fields if you're commonly searching on them by using index-time extractions, but again, selectively and only if you determine it's really necessary for that field (e.g., don't do it with the time fields, and probably not with the dialed number)

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Most efficient would probably be a single search time REGEX extraction:

EXTRACT-fields = (?<hour>.{2})(?<min>.{2})(?<duration_h>.)(?<duration_m>.{2})(?<duration_mtenths>.{8})(?<cc>.)(?<accesscd_dialed>.{4})(?<accesscd_used>.{4})(?<num_dialed>.{15})(?<num_calling.{10})

And so on. That way, all fields come in in a single pass over the data. Note that with this particular data, you may run into some problems searching for particular fields by a specific value (if the value is pressed right up against adjacent fields with no white space). You can deal with those for selected fields if you're commonly searching on them by using index-time extractions, but again, selectively and only if you determine it's really necessary for that field (e.g., don't do it with the time fields, and probably not with the dialed number)

View solution in original post

0 Karma

Explorer

Thank you, I think this is the information we were looking for.
Your time and attention is greatly appreciated!

0 Karma

Splunk Employee
Splunk Employee

Because if you're not searching for the specific values, indexing more fields will increase the size of the index, which can decrease performance for all searches. If you are searching rarely for specific values of fieldname, you can search with fieldname=*value* (vs fieldname=value) which will work but will be slower for that search only. If you are not searching for specific values, but reporting instead (e.g., stats count by number_dialed) then indexed fields are no better than search-time extracted ones.

0 Karma

Explorer

It sounds like index time extraction is best as many of the fields are adjancent.

Why do you recommend against items such as time or dialed number in the extraction at index? The target application with be a Call Detail Record index, and a sub-component of an event correlation system.

0 Karma

Influencer

A single regular expression is IMO the most efficient way to extract the fields here. To get rid of the \x00 values in your events, you could adjust the LINE_BREAKER settings of your sourcetype:

props.conf:

[<your sourcetype>]
LINE_BREAKER=([\x00\r\n]+)
EXTRACT-fields=<the regex here>

View solution in original post

0 Karma

Explorer

The code:

LINE_BREAKER=([\x00\r\n]+)

Does not appear to be removing the "\x00\x00\x00" from the

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!