I am trying to break a field based on some regex. Apparently this can be done with the tokenizer option of the makemv command. However, there is no example on how to use it and I keep getting the following error when I try "Error in 'makemv' command: The tokenizer regular expression is invalid"
Basically, I am trying to break on commas(,) that are not followed by a blank space.
End goal: "4,Something" would result in a new value, but "4, Something" would not.
Example :
| gentimes start=-1
| eval john="1 something,2 something else,3 something, with a comma,4 wibble"
| table john
| makemv tokenizer="(.+?)(?=,\S|$),?" john
What is this? : "(.+?)(?=,\S|$),?"
For the tokenizer to work you need capture groups.
What we're saying here is
(.+?) grab everything - this is the capture group
(?=,\S|$) until you get to a comma followed by a non-whitespace, or the end of the line
,? if there's a comma at the end of the pattern, eat it
result :
As simple replace would do this job.
| replace "," with ", " in john
PS : As per my understood on the requirement
Example :
| gentimes start=-1
| eval john="1 something,2 something else,3 something, with a comma,4 wibble"
| table john
| makemv tokenizer="(.+?)(?=,\S|$),?" john
What is this? : "(.+?)(?=,\S|$),?"
For the tokenizer to work you need capture groups.
What we're saying here is
(.+?) grab everything - this is the capture group
(?=,\S|$) until you get to a comma followed by a non-whitespace, or the end of the line
,? if there's a comma at the end of the pattern, eat it
result :
This probably works for you:
tokenizer="([^,]*)(,(\s[^,]*,?)*)?"
The tokenizer first captures a value:
([^,]*)`)
and then gobbles up everything that's not a field:
(,(\s[^,]*,?)*)?
PS: As per jonuwz's answer I may have treated ", " badly 🙂