I was wondering if you can assign a search-time extracted field one value and then later, in a stanza that will be processed second, overwrite that field with a new value. For example, I have a situation like this:
Let's say I have 2 lines of text that reads:
Player Name: Earvin Johnson - Game 5: 10 Assists
Player Name: Magic Johnson - Game 6: 15 Assists
And I have these stanzas to generate Search-Time Field Extractions.
[stanza_name_foo]
REGEX = Name: (\w+) (\w+)
FORMAT = FIRST_NAME::$1 LAST_NAME::$2
[stanza_name_foo_nickname]
REGEX = Name: Magic
FORMAT = FIRST_NAME::Earvin
So what I'm trying to do is assign the first name and last name of the player. But if the first name just happens to be "Magic", I know that's actually Earvin Johnson, so I wanna change it. In the case of the second sample line, the first stanza will extract the value of "Magic" for the field of FIRST_NAME
. But in the second stanza, if I notice the word "Magic" is found after "Name: ", I wanna change the FIRST_NAME
field to "Earvin".
Will this work? Can I "overwrite" the Search-Time Extracted Field with the new value after it's already been defined once? Or once it's defined, I can't change it? Which means I'll have to do it backwards and run the second stanza first and the first stanza second.
Thanks, guys.
So apparently, the answer is no: you cannot overwrite a search-time extracted field after you've defined it already. So basically, that just means I can accomplish what I want by doing things backwards. In other words, what I had above won't work. But this will:
[stanza_name_foo_01_magic]
REGEX = Name: Magic
FORMAT = FIRST_NAME::Earvin
[stanza_name_foo_99_main]
REGEX = Name: (\w+) (\w+)
FORMAT = FIRST_NAME::$1 LAST_NAME::$2
The reason this works is that now, in ASCII order, "[stanza_name_foo_01_magic]" triggers first. If it finds "Magic", it'll assign FIRST_NAME to Earvin. Then, in the "[stanza_name_foo_99_main]" stanza, it'll read "Magic" and try to assign FIRST_NAME to Magic, but since it's already been assigned, it can't be overwritten.
So I can do what I want, just in the reverse order I had it first listed here.
So apparently, the answer is no: you cannot overwrite a search-time extracted field after you've defined it already. So basically, that just means I can accomplish what I want by doing things backwards. In other words, what I had above won't work. But this will:
[stanza_name_foo_01_magic]
REGEX = Name: Magic
FORMAT = FIRST_NAME::Earvin
[stanza_name_foo_99_main]
REGEX = Name: (\w+) (\w+)
FORMAT = FIRST_NAME::$1 LAST_NAME::$2
The reason this works is that now, in ASCII order, "[stanza_name_foo_01_magic]" triggers first. If it finds "Magic", it'll assign FIRST_NAME to Earvin. Then, in the "[stanza_name_foo_99_main]" stanza, it'll read "Magic" and try to assign FIRST_NAME to Magic, but since it's already been assigned, it can't be overwritten.
So I can do what I want, just in the reverse order I had it first listed here.
Of course, I could also use the "priority" field instead of naming the stanzas so that they execute alphabetically to make things easier with naming the stanzas.
I've not tested this, but a possible option may be a lookup table. Say you have the whole name "Magic Johnson" extracted into a field called player_name
then you can define a lookup table similar to:
player_name,real_name
Magic Johnson,Earvin Johnson
Kareem Abdul-Jabbar, Lew Alcindor
You can use this in your search similar to:
| lookup player_name_lookup player_name output real_name AS player_name
You will have to, of course, define player_name_lookup
in transforms.conf
similar to:
[player_name_lookup]
filename=player_name_lookup.csv
The part I'm not sure about is if lookup
will let you overwrite player_name
in the search results. You may need to further post-process with eval
.
Also, I love the sample you gave, putting Kareem there. ^_^ High Five!
Hey, dwaddle! Thanks for the suggestion, though to be honest I was looking for a way to accomplish this without needing to use a lookup file. And I did test it out myself, actually, and I figured out the way it all behaves. You can see my answer above.