I'm trying to do a field extraction for an Avaya call log. With this particular log event, every character, including spaces, are significant. Therefore this entry:
#000#000#000031419 102900107 8*902 17145550104 71234 88888 0 0244 71247 0 0000101 #015
The characters (17145550104 71234) are in the characters in positions 33-54 and indicates that a call came in from 1-714-555-0104 and went to the exchange 7-1234. This would be 11 characters for the originating number and 10 characters for the destination number; in this case the first five characters of the destination number are spaces.
A similar entry:
#000#000#000031419 132500329 #060 713312155555331 50042424 25331 0 0000316 #015
In this case the field 713312155555331 contains the originating number, 7-1331 prepended by 6 spaces and a destination number of 215-555-5331, and is again in the positions 33-54.
Any suggestions/tips on how to do this sort of field extraction would be greatly appreciated.
Thanks,
Mike
Hi @dahlberg
I Started with this, but If you know the field names maybe you could post them and I'll update my answer, but for now I have just called them a,b,c etc.
https://regex101.com/r/a9Tm6c/1
^(?P<a>[^\s]+)\s+(?P<b>[^\s]+)\s+(?P<c>[^\s]+)\s+(?P<d>[^\s]+)\s+(?P<e>[^\s]+)\s+(?P<f>[^\s]+)?\s+(?P<g>[^\s]+)?\s+(?P<h>[^\s]+)?\s+(?P<i>[^\s]+)\s+(?P<j>[^\s]+)\s+(?P<k>[^\s]+)\s+(?P<l>[^\s]+)
Obviously, not all fields may be present in all logs, so you may need to fiddle the the option flag ?
See this
Yea, you see my problem! This regex correctly returns both the origination number and the destination in the first event. However, with the second event, the regex incorrectly separates the fields because there is no space between the two numbers.
Mike
I think the character index you provided are off, considering your data. Do you have clear number about at what position the call data starts, what is exact lenght of those fields (including spaces) etc? If you've that, your field extraction would look like this
^.{N1}(?<originating>.{N2})(?<destination>.{N3})
Where, N1 is the no of characters before value of originating appears in your raw data, N2 is length of originating number (e.g. 11) and N3 is total length of destination number.
Thanks for taking a look at this.
At position #13 the date starts which is 6 chars then
spaces - 6 chars
time - 4 chars
duration - 4 chars
condition code - 1 char
code-dial - 4 chars
code used - 4 chars
dialed number - 15 chars
calling number - 10 chars
Therefore at position 23 from the start of the event, the originating number is returned. If it is an exchange then it is prepended with spaces. At position 38 the calling number starts and if it is an exchange it is prepended with spaces. As a result, if a call originates from an exchange and and goes to an outside number, there will be one field. If a call comes in from an outside number and goes to an exchange, it will look like two fields, since the second number is prepended by spaces.
Thanks.
Mike