I figured out a way to do what I was trying to do. I was able to use a REGEX to grab the analyst specified sourcetype field from the source file name and since I had to use underscores to separate the fields in the source file name we had to use dashes instead of underscores in the sourcetype field as separators. To replace the dashes with underscores in the sourcetype at index time. I used props and transforms to iterate through the source file name field and replace dashes with underscores. There may be a better way. If anyone has a suggestion please chime in. This method currently supportes sourcetypes specified with up to eight dashes. I would love to see something in transforms like "REPLACE = s/-/_/g".
inputs.conf - Ingest any CSV file generated by an analyst with proper naming convention
[batch:///opt/splunk_input/input/*_*_*_*_*_analyst_*_*_*.csv]
sourcetype = analyst
move_policy = sinkhole
crcSalt = <SOURCE>
disabled = 0
props.conf - Parse the analyst generated file using required time stamp field and extracting the sourcetype from the source file field following "analyst" and change up to eight (8) dashes to underscores in the sourcetype field and add prefix "analyst_". This method always runs eight (8) times. It just works out that when matches are not found the keys I needed were not overwritten.
[analyst]
TRUNCATE = 0
SHOULD_LINEMERGE = false
DATETIME_CONFIG =
MAX_TIMESTAMP_LOOKAHEAD = 4096
INDEXED_EXTRACTIONS = CSV
TIMESTAMP_FIELDS = ts, _time, time
NO_BINARY_CHECK = false
category = Structured
pulldown_type = 1
TRANSFORMS-auto_analyst_set_fields = set_analyst_fields
TRANSFORMS-auto_analyst_set_host = set_analyst_host_to_sensor
TRANSFORMS-auto_analyst_set_index = set_index_for_analyst_sensor
TRANSFORMS-auto_analyst_set_sourcetype = set_var01_to_type, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_dash_to_var02_underscore, \
var02_to_var01, \
var01_to_sourcetype
transforms.conf
#
# Analyst
#
# File Name Fields: client_collection_system_tag_sensor_analyst_type_timestamp_seqnum.csv
#
# REGEX: ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.csv
#
# Match Groups: < $1> _<$2 > _< $3 >_< $4 >_< $5 >_analyst_< $6 >_<$7 >_< $8 >.csv
#
#
[accepted_keys]
var01_key = _var01
var02_key = _var02
#
# Referenced in props.conf [analyst]
#
[set_analyst_fields]
SOURCE_KEY = MetaData:Source
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = analyst_client::$1 analyst_collection::$2 analyst_system::$3 analyst_tag::$4
WRITE_META = true
[set_analyst_host_to_sensor]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Host
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = host::$5
DEFAULT_VALUE = unknown_analyst_host
[set_index_for_analyst_sensor]
SOURCE_KEY = MetaData:Source
DEST_KEY = _MetaData:Index
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = idx_$5
DEFAULT_VALUE = unknown_analyst_index
[set_var01_to_type]
SOURCE_KEY = MetaData:Source
DEST_KEY = _var01
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = _$6
[var01_dash_to_var02_underscore]
SOURCE_KEY = _var01
DEST_KEY = _var02
REGEX = _([^-]+)-([^.]+)
FORMAT = _$1_$2
[var02_to_var01]
SOURCE_KEY = _var02
DEST_KEY = _var01
REGEX = ([^.]+)
FORMAT = $1
[var01_to_sourcetype]
SOURCE_KEY = _var01
DEST_KEY = MetaData:Sourcetype
REGEX = _([^.]+)
FORMAT = sourcetype::analyst_$1
DEFAULT_VALUE = unknown_analyst_sourcetype
fields.conf
[analyst_mission]
INDEXED = false
[analyst_collection]
INDEXED = false
[analyst_system]
INDEXED = false
[analyst_tag]
INDEXED = true
... View more