Hello guys!
I know Splunk has a REGEX helper, but in this case, I have an amount of data wich is almost binary, take a look at one event row:
Foreach 38 substr I should have a new extracted field and it's also important to consider these two "blank spaces" when they found else consider numbers but always in a 38 sequence starting after XX01, in the original data I don't have the ** to mark the beggining of each field, just used it to show my problem...
2014-10-2210:13:19XX01*0003 000000650000006000000000000000000004 00000000000000000000000200000000*0005 000000000000000000000005000000000007010000001700000017000000000000000000080100000024000000230000000000000000000901000000060000000600000000000000000011010000001300000011000000010000000000120100000006000000060000000000000000001301000000060000000700000001000000000013 0000000100000001000000000000000000150100000061000000610000000000000000001511000000670000006700000000000000000015 00000149000001480000000100000000
Thanks in advance if anybody has a hint on this.
Bst Rgrds!
Try this:
... | rex max_match=0 "[A-Z]{2}[0-9]{2}(?<sub38>([\d ]{38})" | ...
This will give you a multivalue variable with all of the matches.
Try this:
... | rex max_match=0 "[A-Z]{2}[0-9]{2}(?<sub38>([\d ]{38})" | ...
This will give you a multivalue variable with all of the matches.
Almost on the way I guess, shoud I use this rex in search tab or interactive field extractor?
There's just something else, these blocks, don't always start by XX01, they can vary for other letters AB, CD, DA, and so on...
If I use this rex match in search how will I extract it to a fixed field? Should use it inside field extractor right?
Thanks a lot @richgalloway
Use this command in the search tab and it will extract the data into a multivalue variable (sub38, in my example). You then can manipulate the variable using the various mv*
commands.
I've modified my answer to accommodate eye-catchers other than 'XX'.
Sorry bothering again, but just compared some fields and they also may not always use 01 after two letters, take a look at this other entry:
2014-10-2211:31:41UV020003 000001380000011800000014000000000004 000000000000000000000011000000010005 0000000000000000000000070000000000070100000041000000410000000000000000000801000000530000004300000002000000000009010000000400000004000000000000000000110100000011000000100000000000000000001201000000050000000500000000000000000013010000001300000014000000000000000000150100000144000001430000000100000000001511000000430000004200000001000000000015 00000309000003050000000400000000
Is there any table command I could use after this | rex expression just to confirm if selected fields are ok to extract in seach?
Thanks a lot @richgalloway
Replace '01' in the rex
command with '[0-9]{2}'. I'll update my answer.
You should be able to see the extracted field with a table sub38
command.
It's starting at the letters... I need it to start considering after two letters and two numbers, look:
UV150003 00000192000001700000001000000000 - UV15 shoud not be considered.
TF900013 00000000000000010000000000000000 - TF90 shoud not be considered.
But I need the blank spaces after 3 and before 0 to be considered in the extraction. Just one more question, will this work every 38 substr or just for first 38?
Excluding the 'UV15' is easy - just move that part of the regex outside the field. I'll update the answer.
The max_match=0
option tells rex to match all sets of 38.
It worked! The regex is perfect now! The only problem, it's just gettin' the first 38 subsrt, this is a complete log:
2010-05-2215:04:16XX050003 000001310000012000000010000000010004 000000000000000000000004000000000005 0000000000000000000000080000000000070100000068000000670000000000000000000801000000730000006200000000000000000009010000001200000012000000000000000000110100000026000000210000000100000000001201000000080000000800000000000000000013010000001200000011000000020000000000150100000134000001340000000000000000001511000000380000003800000000000000000015 00000315000003090000000600000000
And when I use the rex and table, it worked fine, for the first 38
rex max_match=0 "[A-Z]{2}[0-9]{2}(?<sub38>([\d ]{38}))" | table sub38
Show as result:
0003 00000131000001200000001000000001
Shoud this table show the rest, starting from 0004 and on as I need to extract as one single field?
Thanks in advance @richgalloway
You can use the mvcount
command to verify more than one substr is matched. For example, rex... | eval count=mvcount(sub38) | table count
.
count
1
My count is 1, refeers to 0003 00000131000001200000001000000001 only first substr...
I think I see what is happening. There is only one instance per event of 'XX01' followed by 38 characters. This is trickier than I thought. Perhaps we can do this in two separate operations. Use
rex "[A-Z]{2}[0-9]{2}(?<subs>[\d ]+)"
to extract everything after 'XX01' then use
rex field=subs max_match=0 "(?<sub38>[\d ]{38})"
to break the result into 38-character chunks.
Got it, working perfectly now @richgalloway...
I just changed the beggining to [A-Za-z]
But there's something else... I'm now working with two rex, I saved the first one into an extraction field and now I've calling this field in the second rex...
I have some subfields I have to create for each 38 chunk, I did it like this:
| eval A = substr(sub38,1,4)
| eval B = substr(sub38,5,2)
| eval C = substr(sub38,7,8)
| eval D = substr(sub38,15,8)
| eval E = substr(sub38,16,8)
| eval F = substr(sub38,24,8)
Consdering that substr function starts at first argument and moves to second argument position number, I need to extract for the first field 4 values of the chunk, for the second field 2 values of the chunk and the rest 8 values for fours fields subsequent.
But if I do that using eval substr I only get the first 38 chunk as a result and I needed for all event's chunk, any suggestion on how I could possibly do that?
Thanks in advance @richgalloway !
I've about reached the limit of my experience and understanding of multi-value fields. I suggest you open a new question.
Okay bro, thanks in advance for the hints...