Getting Data In

Splunk - Flat file - Field Extraction - Transforms.conf

vasanthmss
Motivator

Hi,
I want to extract the fields using transforms.conf from a flat file. The file contains the following fields. There is no field delimiter. I have tried the below regex but not working. pls help me.

Field Length:
ID -10
FIRST_NAME -15
LAST_NAME -12
ADDRESS - 25
PHONE - 10

Example File1:

ID FIRST_NAME LAST_NAME ADDRESS PHONE
0000001 Name1 Lastname1 Address1 1234567890

00000021 name2 Address2 123456
001 name3 Address3


REGEX = (?i)^(?P<ID>.{10})\S(?<FIRST_NAME>.{15})\S(?<LAST_NAME>.{12})\S(?<ADDRESS>.{25})\S(?<PHONE>.{10})

Thanks,
V.S

V
Tags (2)
1 Solution

kristian_kolb
Ultra Champion

One problem with your REGEX above is that you state that there is a non-space character between your fields. Perhaps you want to use \s instead of \S? Also, if you want to use that regex, I suggest you change it a little and put it in a normal EXTRACT in props.conf only, like so;

props.conf

[your_sourcetype]
EXTRACT-blah = ^(?P<ID>.{10})\s+(?<FIRST_NAME>.{15})\s+(?<LAST_NAME>.{12})\s+(?<ADDRESS>.{25})\s+(?<PHONE>.{10})

This works well for logs like;

1234567890 gregor          von klopp   1, high street            1-555-2345
345323     billy bob ben   joe         buckingham palace, london +44 123123

So as you can see, this approach requires you to have fixed length fields in your log file.


If your fields do NOT contain spaces, and you have a delimiter that is a single space, it could also be done (better) with a REPORT (which uses transforms.conf);

props.conf

[your_sourcetype]
REPORT-blah = extract_blah

transforms.conf

[extract_blah]
DELIMS = " "
FIELDS = ID, FIRST_NAME, LAST_NAME, ADDRESS, PHONE

Alternatively, you can do the same thing by changing the regex and use an ordinary EXTRACT that does not require fixed length fields;

props.conf

[your_sourcetype]
EXTRACT-bleh = ^(?P<ID>\S+)\s+(?<FIRST_NAME>\S+)\s+(?<LAST_NAME>\S+)\s+(?<ADDRESS>\S+)\s+(?<PHONE>\S+)

These two approaches work with events like;

1234123 gregor von_klopp 1,high_street 1-555-2345
23421 billy_bob_ben joe buckingham_palace,london +44123123

Makes sense?


UPDATE:

If you have truly fixed fields, and the field content is truncated if it's above a certain field length, you'd better use the first approach and remove the delimiters from the regex, i.e.

file looks like;

12345678  HM Queen ElizabN/A         Buckingham Palace, London+44123123
0987654321Arnold         Schwarzenegg25, Fifth avenue, NY     +1-555-123

props.conf

EXTRACT-blah = ^(?P<ID>.{10})(?<FIRST_NAME>.{15})(?<LAST_NAME>.{12})(?<ADDRESS>.{25})(?<PHONE>.{10})

View solution in original post

kristian_kolb
Ultra Champion

One problem with your REGEX above is that you state that there is a non-space character between your fields. Perhaps you want to use \s instead of \S? Also, if you want to use that regex, I suggest you change it a little and put it in a normal EXTRACT in props.conf only, like so;

props.conf

[your_sourcetype]
EXTRACT-blah = ^(?P<ID>.{10})\s+(?<FIRST_NAME>.{15})\s+(?<LAST_NAME>.{12})\s+(?<ADDRESS>.{25})\s+(?<PHONE>.{10})

This works well for logs like;

1234567890 gregor          von klopp   1, high street            1-555-2345
345323     billy bob ben   joe         buckingham palace, london +44 123123

So as you can see, this approach requires you to have fixed length fields in your log file.


If your fields do NOT contain spaces, and you have a delimiter that is a single space, it could also be done (better) with a REPORT (which uses transforms.conf);

props.conf

[your_sourcetype]
REPORT-blah = extract_blah

transforms.conf

[extract_blah]
DELIMS = " "
FIELDS = ID, FIRST_NAME, LAST_NAME, ADDRESS, PHONE

Alternatively, you can do the same thing by changing the regex and use an ordinary EXTRACT that does not require fixed length fields;

props.conf

[your_sourcetype]
EXTRACT-bleh = ^(?P<ID>\S+)\s+(?<FIRST_NAME>\S+)\s+(?<LAST_NAME>\S+)\s+(?<ADDRESS>\S+)\s+(?<PHONE>\S+)

These two approaches work with events like;

1234123 gregor von_klopp 1,high_street 1-555-2345
23421 billy_bob_ben joe buckingham_palace,london +44123123

Makes sense?


UPDATE:

If you have truly fixed fields, and the field content is truncated if it's above a certain field length, you'd better use the first approach and remove the delimiters from the regex, i.e.

file looks like;

12345678  HM Queen ElizabN/A         Buckingham Palace, London+44123123
0987654321Arnold         Schwarzenegg25, Fifth avenue, NY     +1-555-123

props.conf

EXTRACT-blah = ^(?P<ID>.{10})(?<FIRST_NAME>.{15})(?<LAST_NAME>.{12})(?<ADDRESS>.{25})(?<PHONE>.{10})
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...