Splunk Search
Highlighted

Help with regex for a highly confusing field extraction

Contributor

Hello guys!
I know Splunk has a REGEX helper, but in this case, I have an amount of data wich is almost binary, take a look at one event row:
Foreach 38 substr I should have a new extracted field and it's also important to consider these two "blank spaces" when they found else consider numbers but always in a 38 sequence starting after XX01, in the original data I don't have the ** to mark the beggining of each field, just used it to show my problem...

2014-10-2210:13:19XX010003 000000650000006000000000000000000004 00000000000000000000000200000000**0005 000000000000000000000005000000000007010000001700000017000000000000000000080100000024000000230000000000000000000901000000060000000600000000000000000011010000001300000011000000010000000000120100000006000000060000000000000000001301000000060000000700000001000000000013 0000000100000001000000000000000000150100000061000000610000000000000000001511000000670000006700000000000000000015 00000149000001480000000100000000

Thanks in advance if anybody has a hint on this.
Bst Rgrds!

Tags (2)
Highlighted

Re: Help with regex for a highly confusing field extraction

SplunkTrust
SplunkTrust

Try this:

... | rex max_match=0 "[A-Z]{2}[0-9]{2}(?<sub38>([\d ]{38})" | ...

This will give you a multivalue variable with all of the matches.

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

Highlighted

Re: Help with regex for a highly confusing field extraction

Contributor

Almost on the way I guess, shoud I use this rex in search tab or interactive field extractor?
There's just something else, these blocks, don't always start by XX01, they can vary for other letters AB, CD, DA, and so on...
If I use this rex match in search how will I extract it to a fixed field? Should use it inside field extractor right?
Thanks a lot @richgalloway

0 Karma
Highlighted

Re: Help with regex for a highly confusing field extraction

SplunkTrust
SplunkTrust

Use this command in the search tab and it will extract the data into a multivalue variable (sub38, in my example). You then can manipulate the variable using the various mv* commands.

I've modified my answer to accommodate eye-catchers other than 'XX'.

---
If this reply helps you, an upvote would be appreciated.
Highlighted

Re: Help with regex for a highly confusing field extraction

Contributor

Sorry bothering again, but just compared some fields and they also may not always use 01 after two letters, take a look at this other entry:

2014-10-2211:31:41UV020003  000001380000011800000014000000000004  000000000000000000000011000000010005  0000000000000000000000070000000000070100000041000000410000000000000000000801000000530000004300000002000000000009010000000400000004000000000000000000110100000011000000100000000000000000001201000000050000000500000000000000000013010000001300000014000000000000000000150100000144000001430000000100000000001511000000430000004200000001000000000015  00000309000003050000000400000000

Is there any table command I could use after this | rex expression just to confirm if selected fields are ok to extract in seach?
Thanks a lot @richgalloway

0 Karma
Highlighted

Re: Help with regex for a highly confusing field extraction

SplunkTrust
SplunkTrust

Replace '01' in the rex command with '[0-9]{2}'. I'll update my answer.

You should be able to see the extracted field with a table sub38 command.

---
If this reply helps you, an upvote would be appreciated.
Highlighted

Re: Help with regex for a highly confusing field extraction

Contributor

It's starting at the letters... I need it to start considering after two letters and two numbers, look:

 UV150003 00000192000001700000001000000000 - UV15 shoud not be considered.
 TF900013 00000000000000010000000000000000 - TF90 shoud not be considered.

But I need the blank spaces after 3 and before 0 to be considered in the extraction. Just one more question, will this work every 38 substr or just for first 38?

0 Karma
Highlighted

Re: Help with regex for a highly confusing field extraction

SplunkTrust
SplunkTrust

Excluding the 'UV15' is easy - just move that part of the regex outside the field. I'll update the answer.

The max_match=0 option tells rex to match all sets of 38.

---
If this reply helps you, an upvote would be appreciated.
Highlighted

Re: Help with regex for a highly confusing field extraction

Contributor

It worked! The regex is perfect now! The only problem, it's just gettin' the first 38 subsrt, this is a complete log:

2010-05-2215:04:16XX050003  000001310000012000000010000000010004  000000000000000000000004000000000005  0000000000000000000000080000000000070100000068000000670000000000000000000801000000730000006200000000000000000009010000001200000012000000000000000000110100000026000000210000000100000000001201000000080000000800000000000000000013010000001200000011000000020000000000150100000134000001340000000000000000001511000000380000003800000000000000000015  00000315000003090000000600000000

And when I use the rex and table, it worked fine, for the first 38

rex max_match=0 "[A-Z]{2}[0-9]{2}(?<sub38>([\d ]{38}))" | table sub38

Show as result:

0003 00000131000001200000001000000001

Shoud this table show the rest, starting from 0004 and on as I need to extract as one single field?
Thanks in advance @richgalloway

0 Karma
Highlighted

Re: Help with regex for a highly confusing field extraction

SplunkTrust
SplunkTrust

You can use the mvcount command to verify more than one substr is matched. For example, rex... | eval count=mvcount(sub38) | table count.

---
If this reply helps you, an upvote would be appreciated.
0 Karma