Splunk Search

How to extract a field based on other defined fields?

mcm10285
Communicator

Anyone has an idea on how to define a new field based on previously defined fields? Log format is a bit tricky, delimiters are not the same (some are spaces, some are tabs)

Example Log:

field1(tab)field2(spaces)unextracted_data(space)field3

Objective is to extract "unextracted_data" as any data (excluding white spaces if possible) in between any 2 fields (in this example, field2 and field3).

Tags (2)
0 Karma
1 Solution

kristian_kolb
Ultra Champion

It would help a LOT if you could post a few sample lines of data.

Anyway, assuming that you have a log format that looks like the following

 login=<a username>
 tab
 eventcode=<event code>
 spaces
 some data that you wish to extract
 some more spaces
 status=<success/fail>

On single lines this would be something like;

login=JR    eventcode=3   kill all! get oil!  status=success
login=bobby eventcode=8  get nice haircut  status=fail
login=cliff eventcode=4    succeed in business  status=fail
login=sueellen    eventcode=6  have drink (again)  status=success

You can extract whatever is between the eventcode and status fields by the following rex statement.

sourcetype=your_sourcetype | rex "eventcode=\d+\s+(?<task>.*)\s+status=\w+$"

You should now have a field called task containing the text between the previously extracted fields.


UPDATE:

Ok, we'll be making a few assumptions anyway:

after the date/time and some text there will always be parenthesis around a few uppercase letters/words
followed by some space
followed by 2-5 uppercase letters
followed by some space
followed by the name we wish to extract. This name can contain uppercase letters, commas and spaces, but not numbers
followed by some space
followed by a 5-20 digit number
followed by some space
followed by a 5-20 digit number

If this is the case, the extraction of NAME would be:

rex "\s+\([A-Z ]+\)\s+[A-Z]{2,5}\s+(?<NAME>[^\d]+)\s+\d{5,20}\s+\d{5,20}"

hope this helps,

kristian

View solution in original post

mcm10285
Communicator

Great! Worked Like a charm! have to exercise myself on the regex. Thanks!

0 Karma

kristian_kolb
Ultra Champion

Glad it helped - please mark the question as 'answered' a/o upvote.

/k

0 Karma

mcm10285
Communicator

Please see answers to your queries below

first there is date and time. fine.
then a three letter code (always present, always three, always letters?)

then something in parentheses. (always present, always within parentheses?)

"4 NCR (NEW AREAS)" is one field

then a two letter code (always two, always letters, always present?)

Could be up to 5 letters

then a name(?) (always lastname, firstname(s)?)

Could be one name only, some contain "comma" at the end

then a number (8 digits), always eight? always present?

Could be more than 8 numbers, it is a reference number that increases

Where are the tabs?

may not be tabs, can be a number of spaces....

0 Karma

mcm10285
Communicator

Your idea is almost there, however, your example log is different. What I meant by previously defined fields are fields that were extracted thru "Field Extraction" of SearchHead (manually defined). Below is a sample of logs.

1/31/2012 23:51 4 NCR (NEW AREAS) VB CRUZ, MODESTO FERDINAND 27488123 9795322 PL TIMBER - NTF MNL704-M MNL704-M FCR012 NO BROWSING COM 02/01/2012 21:33:45 FXR039 NO TROUBLE FOUND - VISITED FNR095 NO TROUBLE FOUND -VISITED COM-zptit609-1202-4098 VISATECH/COMBO/1PM/NTF FOR TERMINATION OF ACCOUNT/TALKED TO SUBS/CTC#09175704996/Uncontact Manila 2 VISATECH LUZON GMA 2

Those in bold were extracted using the "Field Extractor" and the one in Italics is not yet defined. Hope this clarifies it further and hope you can help further as well, thanks.

0 Karma

kristian_kolb
Ultra Champion

I can help you quite easily with this particular event, but in order for this to work for sure, I'd need to know more about the actual format.

first there is date and time. fine.
then a number (always present, always number?)
then a three letter code (always present, always three, always letters?)
then something in parentheses. (always present, always within parentheses?)
then a two letter code (always two, always letters, always present?)
then a name(?) (always lastname, firstname(s)?)
then a number (8 digits), always eight? always present?

Where are the tabs?

/k

0 Karma

kristian_kolb
Ultra Champion

It would help a LOT if you could post a few sample lines of data.

Anyway, assuming that you have a log format that looks like the following

 login=<a username>
 tab
 eventcode=<event code>
 spaces
 some data that you wish to extract
 some more spaces
 status=<success/fail>

On single lines this would be something like;

login=JR    eventcode=3   kill all! get oil!  status=success
login=bobby eventcode=8  get nice haircut  status=fail
login=cliff eventcode=4    succeed in business  status=fail
login=sueellen    eventcode=6  have drink (again)  status=success

You can extract whatever is between the eventcode and status fields by the following rex statement.

sourcetype=your_sourcetype | rex "eventcode=\d+\s+(?<task>.*)\s+status=\w+$"

You should now have a field called task containing the text between the previously extracted fields.


UPDATE:

Ok, we'll be making a few assumptions anyway:

after the date/time and some text there will always be parenthesis around a few uppercase letters/words
followed by some space
followed by 2-5 uppercase letters
followed by some space
followed by the name we wish to extract. This name can contain uppercase letters, commas and spaces, but not numbers
followed by some space
followed by a 5-20 digit number
followed by some space
followed by a 5-20 digit number

If this is the case, the extraction of NAME would be:

rex "\s+\([A-Z ]+\)\s+[A-Z]{2,5}\s+(?<NAME>[^\d]+)\s+\d{5,20}\s+\d{5,20}"

hope this helps,

kristian

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...