Splunk Search

Field extraction using REGEX

meenal901
Communicator

Hi,

I have a flat file of this format:

0229052320112MARGARET CHODKIEWICZ     APT 603-2100 SHEROBEE RD R164I00022B0A2013-01-022013-01-082013-01-0953N54 UNETCH 012013-01-08          9052320112                                                          5201  

I need to capture the first 3 digits as CUST-CODE.

I have written the below in config files:

PROPS.CONF

[source1]
TRANSFORMS-mysource= source1trans
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false

TRANSFORMS.CONF

[source1trans]
DEST_KEY = MetaData:CUST-CD
REGEX = ^\d{3}
FORMAT = DF-SO-CUST-CD::$1

Still, on search the field CUST-CD is not captured. I also tried IFX to extract the fiels, and it is saved.. but the CUST-CD is not visible in the interesting fields.
Please suggest what i missed.

Thanks.

Tags (1)
0 Karma

meenal901
Communicator

Just another question:

Is there a way we can have REGEX when we know the columns are fixed length?
For the below data:

0229052320112MARGARET CHODKIEWICZ APT 603-2100 SHEROBEE RD R164I00022B0A2013-01-022013-01-082013-01-0953N54 UNETCH 012013-01-08 9052320112

I know the data format.. but its difficult to catch patterns. Can and how we specify where each column begins/ends?

0 Karma

kristian_kolb
Ultra Champion

First, unless you are sure that you need to make this extraction at index time, you should not use TRANSFORMS. And I'm not sure that MetaData:CUST-CD is ever a valid DEST_KEY, at least I don't think that this is what you want.

Abandon that line of reasoning and instead, do this in props.conf only.

[source1]
EXTRACT-blah=(?m)^(?<CUST_ID>\d{3})

Hope this helps,

Kristian

kristian_kolb
Ultra Champion

I believe you can do this like

EXTRACT-stuff=^(?\d{3})(?\d{10})(?\S+)\s+(?\S+)\s+ etc etc

It's just a matter of defining your regexes.

meenal901
Communicator

Just another question:

Is there a way we can have REGEX when we know the columns are fixed length?
For the below data:

0229052320112MARGARET CHODKIEWICZ APT 603-2100 SHEROBEE RD R164I00022B0A2013-01-022013-01-082013-01-0953N54 UNETCH 012013-01-08 9052320112

I know the data format.. but its difficult to catch patterns. Can and how we specify where each column begins/ends?

0 Karma

kristian_kolb
Ultra Champion

True, that is a bit silly, but that's the way it works, also, I forgot to even type it in the example above. Fixed that now.

0 Karma

meenal901
Communicator

Thanks Kristian!

Earlier i had tried with EXTRACT in props.conf.. but there were no results hence went ahead with TRANSFORMS.

The main problem here was the CUST-CD i.e. the name of column. It had a "hyphen", which is not accepted by props.conf. I made it CUST_CD in props.conf and it worked. this is what i did:

EXTRACT-CUST_CODE = (?i)(?P\d{3})

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...