Thanks for the suggestions, everyone. This is only a training/development exercise, so it's not anything important. Well, except to me, that is. 🙂 What I'm working with is a dataset listing recent aviation incidents from the US FAA. A couple sample entries from the file:
"No","30-JUN-16","29-JUN-16","12:40:00Z","FREDERICK","Maryland","","AIRCRAFT ON LANDING, BOUNCED, FREDERICK, MD","Accident","FAA Baltimore FSDO-07","N2473G","","","CESSNA","172","","Substantial","","LANDING (LDG)","","None","","1","","","","","","","","","","","","","","","","","","",""
"No","30-JUN-16","29-JUN-16","16:52:00Z","HOUSTON","Texas","","AIRCRAFT DURING FLIGHT, SUSTAINED A BIRDSTRIKE INTO THE WINDSHIELD, RETURNED AND LANDED WITHOUT INCIDENT, 15 MILES FROM HOUSTON, TX","Accident","FAA Houston FSDO-09","N106AF","","","CESSNA","172","","Substantial","Instruction","UNKNOWN (UNK)","","Minor","","","2","","","","","","","","","","","","","","","","","",""
Here are the column headings (field names):
"UPDATED","ENTRY_DATE","EVENT_LCL_DATE","EVENT_LCL_TIME","LOC_CITY_NAME","LOC_STATE_NAME","LOC_CNTRY_NAME","RMK_TEXT","EVENT_TYPE_DESC","FSDO_DESC","REGIST_NBR","FLT_NBR","ACFT_OPRTR","ACFT_MAKE_NAME","ACFT_MODEL_NAME","ACFT_MISSING_FLAG","ACFT_DMG_DESC","FLT_ACTIVITY","FLT_PHASE","FAR_PART","MAX_INJ_LVL","FATAL_FLAG","FLT_CRW_INJ_NONE","FLT_CRW_INJ_MINOR","FLT_CRW_INJ_SERIOUS","FLT_CRW_INJ_FATAL","FLT_CRW_INJ_UNK","CBN_CRW_INJ_NONE","CBN_CRW_INJ_MINOR","CBN_CRW_INJ_SERIOUS","CBN_CRW_INJ_FATAL","CBN_CRW_INJ_UNK","PAX_INJ_NONE","PAX_INJ_MINOR","PAX_INJ_SERIOUS","PAX_INJ_FATAL","PAX_INJ_UNK","GRND_INJ_NONE","GRND_INJ_MINOR","GRND_INJ_SERIOUS","GRND_INJ_FATAL","GRND_INJ_UNK"
The field EVENT_TYPE_DESC contains either "Accident" or "Incident" depending on the type of incident. What I wanted to do was examine the contents of RMK_TEXT (remarks field), and change the EVENT_TYPE_DESC field to "Birdstrike" if the words "birdstrike" or "bird strike" appeared anywhere in the remarks.
After tossing this back and forth with a colleague, he came up with some suggestions, and after some back and forth, here's my props.conf and transforms.conf files:
props.conf:
[faa_events]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Comma-separated value format. Set header and other settings in "De
limited Settings"
disabled = false
pulldown_type = true
TRANSFORMS-birdstrike_edit = birdstrike_edit
TIMESTAMP_FIELDS = EVENT_LCL_DATE, EVENT_LCL_TIME
transforms.conf:
[birdstrike_edit]
REGEX=^(?<part1>.*BIRDSTRIKE.+?",")(?<incident>.+?)(?<part3>",".+)$
DEST_KEY=_raw
FORMAT=$1Birdstrike$3
Now, here's the odd part: it actually works. When I upload the .csv file and do a search for "birdstrike", the two entries that have "birdstrike" in the remarks show up, and the detail display of the two records show that "Accident" has been replaced with "Birdstrike". But the list of fields on the left still shows there are only two values for EVENT_TYPE_DESC; "Accident" and "Incident". This tells me that Splunk indexes the field names before it applies the transforms.conf files, which to me seems a bit weird.
Please forgive my long-windedness!
... View more