Splunk Search

Field extraction using REGEX

meenal901
Communicator

Hi,

I have a flat file of this format:

0229052320112MARGARET CHODKIEWICZ     APT 603-2100 SHEROBEE RD R164I00022B0A2013-01-022013-01-082013-01-0953N54 UNETCH 012013-01-08          9052320112                                                          5201  

I need to capture the first 3 digits as CUST-CODE.

I have written the below in config files:

PROPS.CONF

[source1]
TRANSFORMS-mysource= source1trans
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false

TRANSFORMS.CONF

[source1trans]
DEST_KEY = MetaData:CUST-CD
REGEX = ^\d{3}
FORMAT = DF-SO-CUST-CD::$1

Still, on search the field CUST-CD is not captured. I also tried IFX to extract the fiels, and it is saved.. but the CUST-CD is not visible in the interesting fields.
Please suggest what i missed.

Thanks.

Tags (1)
0 Karma

meenal901
Communicator

Just another question:

Is there a way we can have REGEX when we know the columns are fixed length?
For the below data:

0229052320112MARGARET CHODKIEWICZ APT 603-2100 SHEROBEE RD R164I00022B0A2013-01-022013-01-082013-01-0953N54 UNETCH 012013-01-08 9052320112

I know the data format.. but its difficult to catch patterns. Can and how we specify where each column begins/ends?

0 Karma

kristian_kolb
Ultra Champion

First, unless you are sure that you need to make this extraction at index time, you should not use TRANSFORMS. And I'm not sure that MetaData:CUST-CD is ever a valid DEST_KEY, at least I don't think that this is what you want.

Abandon that line of reasoning and instead, do this in props.conf only.

[source1]
EXTRACT-blah=(?m)^(?<CUST_ID>\d{3})

Hope this helps,

Kristian

kristian_kolb
Ultra Champion

I believe you can do this like

EXTRACT-stuff=^(?\d{3})(?\d{10})(?\S+)\s+(?\S+)\s+ etc etc

It's just a matter of defining your regexes.

meenal901
Communicator

Just another question:

Is there a way we can have REGEX when we know the columns are fixed length?
For the below data:

0229052320112MARGARET CHODKIEWICZ APT 603-2100 SHEROBEE RD R164I00022B0A2013-01-022013-01-082013-01-0953N54 UNETCH 012013-01-08 9052320112

I know the data format.. but its difficult to catch patterns. Can and how we specify where each column begins/ends?

0 Karma

kristian_kolb
Ultra Champion

True, that is a bit silly, but that's the way it works, also, I forgot to even type it in the example above. Fixed that now.

0 Karma

meenal901
Communicator

Thanks Kristian!

Earlier i had tried with EXTRACT in props.conf.. but there were no results hence went ahead with TRANSFORMS.

The main problem here was the CUST-CD i.e. the name of column. It had a "hyphen", which is not accepted by props.conf. I made it CUST_CD in props.conf and it worked. this is what i did:

EXTRACT-CUST_CODE = (?i)(?P\d{3})

0 Karma
Get Updates on the Splunk Community!

New Year. New Skills. New Course Releases from Splunk Education

A new year often inspires reflection—and reinvention. Whether your goals include strengthening your security ...

Splunk and TLS: It doesn't have to be too hard

Overview Creating a TLS cert for Splunk usage is pretty much standard openssl.  To make life better, use an ...

Faster Insights with AI, Streamlined Cloud-Native Operations, and More New Lantern ...

Splunk Lantern is a Splunk customer success center that provides practical guidance from Splunk experts on key ...