I need to change the value of one field at indexing time, based on the value of another. This is a .csv file with historical data I'm pulling in. Example:
If GIFT_DESC field contains the words "fruitcake" or "fruit cake", I want to change the GIFT_TYPE field to "Bad gift". What's the best way to go about that?
If GIFT_DESC field contains the words "fruitcake" or "fruit cake", I want to
change the GIFT_TYPE field to "Bad gift". What's the best way to go about that?
To do this at index-time you could have props.conf
apply a TRANSFORMS-check = transform-detect-bad-gifts
. In transforms.conf
, [transform-detect-bad-gifts]
would have SOURCE_KEY = GIFT_DESC
, REGEX = fruit ?cake
, DEST_KEY = GIFT_TYPE
, FORMAT = Bad gift
.
Thanks for the suggestions, everyone. This is only a training/development exercise, so it's not anything important. Well, except to me, that is. 🙂 What I'm working with is a dataset listing recent aviation incidents from the US FAA. A couple sample entries from the file:
"No","30-JUN-16","29-JUN-16","12:40:00Z","FREDERICK","Maryland","","AIRCRAFT ON LANDING, BOUNCED, FREDERICK, MD","Accident","FAA Baltimore FSDO-07","N2473G","","","CESSNA","172","","Substantial","","LANDING (LDG)","","None","","1","","","","","","","","","","","","","","","","","","",""
"No","30-JUN-16","29-JUN-16","16:52:00Z","HOUSTON","Texas","","AIRCRAFT DURING FLIGHT, SUSTAINED A BIRDSTRIKE INTO THE WINDSHIELD, RETURNED AND LANDED WITHOUT INCIDENT, 15 MILES FROM HOUSTON, TX","Accident","FAA Houston FSDO-09","N106AF","","","CESSNA","172","","Substantial","Instruction","UNKNOWN (UNK)","","Minor","","","2","","","","","","","","","","","","","","","","","",""
Here are the column headings (field names):
"UPDATED","ENTRY_DATE","EVENT_LCL_DATE","EVENT_LCL_TIME","LOC_CITY_NAME","LOC_STATE_NAME","LOC_CNTRY_NAME","RMK_TEXT","EVENT_TYPE_DESC","FSDO_DESC","REGIST_NBR","FLT_NBR","ACFT_OPRTR","ACFT_MAKE_NAME","ACFT_MODEL_NAME","ACFT_MISSING_FLAG","ACFT_DMG_DESC","FLT_ACTIVITY","FLT_PHASE","FAR_PART","MAX_INJ_LVL","FATAL_FLAG","FLT_CRW_INJ_NONE","FLT_CRW_INJ_MINOR","FLT_CRW_INJ_SERIOUS","FLT_CRW_INJ_FATAL","FLT_CRW_INJ_UNK","CBN_CRW_INJ_NONE","CBN_CRW_INJ_MINOR","CBN_CRW_INJ_SERIOUS","CBN_CRW_INJ_FATAL","CBN_CRW_INJ_UNK","PAX_INJ_NONE","PAX_INJ_MINOR","PAX_INJ_SERIOUS","PAX_INJ_FATAL","PAX_INJ_UNK","GRND_INJ_NONE","GRND_INJ_MINOR","GRND_INJ_SERIOUS","GRND_INJ_FATAL","GRND_INJ_UNK"
The field EVENT_TYPE_DESC contains either "Accident" or "Incident" depending on the type of incident. What I wanted to do was examine the contents of RMK_TEXT (remarks field), and change the EVENT_TYPE_DESC field to "Birdstrike" if the words "birdstrike" or "bird strike" appeared anywhere in the remarks.
After tossing this back and forth with a colleague, he came up with some suggestions, and after some back and forth, here's my props.conf and transforms.conf files:
props.conf:
[faa_events]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Comma-separated value format. Set header and other settings in "De
limited Settings"
disabled = false
pulldown_type = true
TRANSFORMS-birdstrike_edit = birdstrike_edit
TIMESTAMP_FIELDS = EVENT_LCL_DATE, EVENT_LCL_TIME
transforms.conf:
[birdstrike_edit]
REGEX=^(?<part1>.*BIRDSTRIKE.+?",")(?<incident>.+?)(?<part3>",".+)$
DEST_KEY=_raw
FORMAT=$1Birdstrike$3
Now, here's the odd part: it actually works. When I upload the .csv file and do a search for "birdstrike", the two entries that have "birdstrike" in the remarks show up, and the detail display of the two records show that "Accident" has been replaced with "Birdstrike". But the list of fields on the left still shows there are only two values for EVENT_TYPE_DESC; "Accident" and "Incident". This tells me that Splunk indexes the field names before it applies the transforms.conf files, which to me seems a bit weird.
Please forgive my long-windedness!
I don't know how it works or exactly in what order, but INDEXED_EXTRACTIONS is an input-time setting. I don't know if that means the fields are created at input or just that the headers are gathered? Not sure. But I wonder if you can add another stanza to transforms where source key is the remarks field, regex matches birdstrike, dest key is the type field and format is just birdstrike? Maybe then both the raw data and indexed fields would be updated appropriately?
Also, as I'm sure you know, you could replace the raw data like you're doing and use a REPORT stanza to extract the fields at search time, instead of using indexed extractions to do at index time
I think it depends on the format of your data and how consistent that format is. But I was able to do something like this with a couple sedcmd entries in props.conf.
Here is my test data
blah,meh,fruitcake,huh,good gift
meh,huh,fruit cake,blah,good gift
meh,huh,drone,blah,good gift
blah,huh,socks,me,bad gift
blah,meh,fruitcake,huh,ok gift
And here are the sedcmd entries I'm using in props
SEDCMD-test1 = s/^([^,]+,){2}(fruitcake|fruit cake),([^,]+,){1}/&replace me,/g
SEDCMD-test2 = s/replace me,[^,]+/bad gift/g
So I'm using sed twice. The first time, I grab everything up to my gift_type field if it includes fruitcake and replace that with the exact same string (the ampersand) but add another field called "replace me". If fruitcake isn't there, then nothing gets replaced.
The second sedcmd finds replace me and then gift_type and replaces that all with just "bad gift". If the "replace me" string isn't there, nothing gets replaced. With this simple data set it seems to work. Where expected, the gift type was set to "bad gift". The other fields were left alone.
I'm not sure if this will affect anything else that you do at parse time. And I'm not sure if this will work for your data set in general. And there may be a better way to do it. But maybe this will at least point you in the right direction.
Don't worry about doing this at index time. Just do it using the search language and eval to create a new field based on the existing field.
index=foo | eval GIFT_TYPE=if(match(GIFT_DESC, "fruit\s?cake"), "Bad gift", "Good gift")
Get familiar with all the possible functions for the eval command. It is a must know search command.
Yup, I'm a big fan of the eval command and doing things like this at search time. But what if I really, really want to do this at indexing time because I'm loading a ton of old data and need to conditionally change fields in this way so that old events are consistent with how new events are recorded?
Can you post some sample events (where you want to make change)?
I don't think/aware of any native method to achieve that. You should either look at creating a field at search time (field extraction/calculated fields OR lookup) OR use scripted input to manipulate the data and send to Splunk.