Getting Data In

Splunk - Flat file - Field Extraction - Transforms.conf

vasanthmss
Motivator

Hi,
I want to extract the fields using transforms.conf from a flat file. The file contains the following fields. There is no field delimiter. I have tried the below regex but not working. pls help me.

Field Length:
ID -10
FIRST_NAME -15
LAST_NAME -12
ADDRESS - 25
PHONE - 10

Example File1:

ID FIRST_NAME LAST_NAME ADDRESS PHONE
0000001 Name1 Lastname1 Address1 1234567890

00000021 name2 Address2 123456
001 name3 Address3


REGEX = (?i)^(?P<ID>.{10})\S(?<FIRST_NAME>.{15})\S(?<LAST_NAME>.{12})\S(?<ADDRESS>.{25})\S(?<PHONE>.{10})

Thanks,
V.S

V
Tags (2)
1 Solution

kristian_kolb
Ultra Champion

One problem with your REGEX above is that you state that there is a non-space character between your fields. Perhaps you want to use \s instead of \S? Also, if you want to use that regex, I suggest you change it a little and put it in a normal EXTRACT in props.conf only, like so;

props.conf

[your_sourcetype]
EXTRACT-blah = ^(?P<ID>.{10})\s+(?<FIRST_NAME>.{15})\s+(?<LAST_NAME>.{12})\s+(?<ADDRESS>.{25})\s+(?<PHONE>.{10})

This works well for logs like;

1234567890 gregor          von klopp   1, high street            1-555-2345
345323     billy bob ben   joe         buckingham palace, london +44 123123

So as you can see, this approach requires you to have fixed length fields in your log file.


If your fields do NOT contain spaces, and you have a delimiter that is a single space, it could also be done (better) with a REPORT (which uses transforms.conf);

props.conf

[your_sourcetype]
REPORT-blah = extract_blah

transforms.conf

[extract_blah]
DELIMS = " "
FIELDS = ID, FIRST_NAME, LAST_NAME, ADDRESS, PHONE

Alternatively, you can do the same thing by changing the regex and use an ordinary EXTRACT that does not require fixed length fields;

props.conf

[your_sourcetype]
EXTRACT-bleh = ^(?P<ID>\S+)\s+(?<FIRST_NAME>\S+)\s+(?<LAST_NAME>\S+)\s+(?<ADDRESS>\S+)\s+(?<PHONE>\S+)

These two approaches work with events like;

1234123 gregor von_klopp 1,high_street 1-555-2345
23421 billy_bob_ben joe buckingham_palace,london +44123123

Makes sense?


UPDATE:

If you have truly fixed fields, and the field content is truncated if it's above a certain field length, you'd better use the first approach and remove the delimiters from the regex, i.e.

file looks like;

12345678  HM Queen ElizabN/A         Buckingham Palace, London+44123123
0987654321Arnold         Schwarzenegg25, Fifth avenue, NY     +1-555-123

props.conf

EXTRACT-blah = ^(?P<ID>.{10})(?<FIRST_NAME>.{15})(?<LAST_NAME>.{12})(?<ADDRESS>.{25})(?<PHONE>.{10})

View solution in original post

kristian_kolb
Ultra Champion

One problem with your REGEX above is that you state that there is a non-space character between your fields. Perhaps you want to use \s instead of \S? Also, if you want to use that regex, I suggest you change it a little and put it in a normal EXTRACT in props.conf only, like so;

props.conf

[your_sourcetype]
EXTRACT-blah = ^(?P<ID>.{10})\s+(?<FIRST_NAME>.{15})\s+(?<LAST_NAME>.{12})\s+(?<ADDRESS>.{25})\s+(?<PHONE>.{10})

This works well for logs like;

1234567890 gregor          von klopp   1, high street            1-555-2345
345323     billy bob ben   joe         buckingham palace, london +44 123123

So as you can see, this approach requires you to have fixed length fields in your log file.


If your fields do NOT contain spaces, and you have a delimiter that is a single space, it could also be done (better) with a REPORT (which uses transforms.conf);

props.conf

[your_sourcetype]
REPORT-blah = extract_blah

transforms.conf

[extract_blah]
DELIMS = " "
FIELDS = ID, FIRST_NAME, LAST_NAME, ADDRESS, PHONE

Alternatively, you can do the same thing by changing the regex and use an ordinary EXTRACT that does not require fixed length fields;

props.conf

[your_sourcetype]
EXTRACT-bleh = ^(?P<ID>\S+)\s+(?<FIRST_NAME>\S+)\s+(?<LAST_NAME>\S+)\s+(?<ADDRESS>\S+)\s+(?<PHONE>\S+)

These two approaches work with events like;

1234123 gregor von_klopp 1,high_street 1-555-2345
23421 billy_bob_ben joe buckingham_palace,london +44123123

Makes sense?


UPDATE:

If you have truly fixed fields, and the field content is truncated if it's above a certain field length, you'd better use the first approach and remove the delimiters from the regex, i.e.

file looks like;

12345678  HM Queen ElizabN/A         Buckingham Palace, London+44123123
0987654321Arnold         Schwarzenegg25, Fifth avenue, NY     +1-555-123

props.conf

EXTRACT-blah = ^(?P<ID>.{10})(?<FIRST_NAME>.{15})(?<LAST_NAME>.{12})(?<ADDRESS>.{25})(?<PHONE>.{10})
Get Updates on the Splunk Community!

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

When you think of Boston, you might picture colonial charm, world-class universities, or even the crack of a ...