- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How does splunk extract search time fields in "interesting fields"?
which props.conf setting does splunk use to extract interesting fields from _raw field.
I am trying to use collect command to get _raw data from one index into another. However, it does not extract interesting fields. If I give sourcetype=splunkd. It extracts interesting fields. I understand using a different sourcetype other than stash will take license usage. So, I should be able to create a custom field extraction for the stash source file paths without taking any license.
I did a ./splunk btool props list splunkd and this is what it shows.
[splunkd]
ADD_EXTRA_TIME_FIELDS = True
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
DEPTH_LIMIT = 1000
DETERMINE_TIMESTAMP_DATE_WITH_SYSTEM_TIME = false
EXTRACT-fields = (?i)^(?:[^ ]* ){2}(?:[+\-]\d+ )?(?P<log_level>[^ ]*)\s+(?P<component>[^ ]+) - (?P<event_message>.+)
HEADER_MODE =
LB_CHUNK_BREAKER_TRUNCATE = 2000000
LEARN_MODEL = true
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MATCH_LIMIT = 100000
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 40
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIME_FORMAT = %m-%d-%Y %H:%M:%S.%l %z
TRANSFORMS =
TRUNCATE = 20000
detect_trailing_nulls = false
maxDist = 100
priority =
sourcetype =
termFrequencyWeightedDist = false
for default stanza, it shows :
[default]
ADD_EXTRA_TIME_FIELDS = True
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
DEPTH_LIMIT = 1000
DETERMINE_TIMESTAMP_DATE_WITH_SYSTEM_TIME = false
HEADER_MODE =
LB_CHUNK_BREAKER_TRUNCATE = 2000000
LEARN_MODEL = true
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MATCH_LIMIT = 100000
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = True
TRANSFORMS =
TRUNCATE = 10000
detect_trailing_nulls = false
maxDist = 100
priority =
sourcetype =
termFrequencyWeightedDist = false
I verified the data and it is not in json format. So, AUTO_KV_JSON would not apply to it.
The only thing I could find in transforms and props.conf which separate fields based upon "=" is
[ad-kv]
CAN_OPTIMIZE = True
CLEAN_KEYS = True
DEFAULT_VALUE =
DEPTH_LIMIT = 1000
DEST_KEY =
FORMAT =
KEEP_EMPTY_VALS = False
LOOKAHEAD = 4096
MATCH_LIMIT = 100000
MV_ADD = true
REGEX = (?<_KEY_1>[\w-]+)=(?<_VAL_1>[^\r\n]*)
SOURCE_KEY = _raw
WRITE_META = False
which is being called by
[ActiveDirectory]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+---splunk-admon-end-of-event---\r\n[\r\n]*)
EXTRACT-GUID = (?i)(?!=\w)(?:objectguid|guid)\s*=\s*(?<guid_lookup>[\w\-]+)
EXTRACT-SID = objectSid\s*=\s*(?<sid_lookup>\S+)
REPORT-MESSAGE = ad-kv
# some schema AD events may be very long
MAX_EVENTS = 10000
TRUNCATE = 100000
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


That's a lot of work to create a backup of an index. Splunk has a document describing how to back up indexed data. See https://docs.splunk.com/Documentation/Splunk/8.2.1/Indexer/Backupindexeddata
Another way to protect your data is via replication done by an indexer cluster. See https://docs.splunk.com/Documentation/Splunk/8.2.1/Indexer/Aboutclusters
If this reply helps you, Karma would be appreciated.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


What regex did you use in your rex command? I would use the expression in the EXTRACT-fields attribute from props.conf then add more rex commands to extract more fields.
Stepping back, what problem are you trying to solve by copying data between indexes?
If this reply helps you, Karma would be appreciated.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it was a simple regex. regex was not the issue. as the same regex worked with other sourcetypes, but not stash. These commands worked for me. I am still figuring it out how to retain the original host, source, sourcetype
| extract auto=t
I wanted to merge data from one index into another for a use case. My understanding is collect command does the work. It is also documented here.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect
Copying events to a different index
You can use the collect command to copy search results to another index. Construct a search that returns the data you want to copy, and pipe the results to the collect command. For example:
index=foo | ... | collect index=bar
This search writes the results into the bar index. The sourcetype is changed to stash.
You can specify a sourcetype with the collect command. However, specifying a sourcetype counts against your license, as if you indexed the data again.
We can probably keep the original sourcetypes and host, and source values too. But license usage will become an issue since the amount of data is in TBs. I think I saw a thread about how you can append source, sourcetype, host to _raw.
I am still looking for it.
| eval _raw=_raw.orig_host=$host..orig_source=$source
Once it is done, I can use the destination index like
index=destinationindex|eval host=orig_host ...|extract auto=t
and I will have a backup index data without consuming more license usage.
It also says: data is stored under:
The file that is written to the var/spool/splunk path ends in .stash_hec instead of .stash.
while the saved results from normal searches are stored under
var/run/splunk/dispatch.
So, splunk should not replicate the artifacts from spool/splunk to other search head cluster members. I can test it out though. So, that should rule out the results getting replicated across search peers and having duplicate events?
If I do schedule a search with collect command, for this use case, should it be run in fast mode, or verbose mode? or it doesn't matter, most likely scheduled searches always run in fast mode? and where will the results from scheduled search for a collect command get stored, under dispatch or spool?
What fields does collect command collect from source index?
thanks
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I gave a try to use rex on sourcetype=stash. It is not working. Even a basic regex is not working. Seems like I will have to change the sourcetype in order to get the interesting fields?
@splunk @richgalloway - would you have any idea? - thanks
