topic Re: How does splunk extract search time fields in "interesting fields"? in Splunk Search

How does splunk extract search time fields in "interesting fields"?

goelt2000 — Sun, 04 Jul 2021 06:58:56 GMT

which props.conf setting does splunk use to extract interesting fields from _raw field.

I am trying to use collect command to get _raw data from one index into another. However, it does not extract interesting fields. If I give sourcetype=splunkd. It extracts interesting fields. I understand using a different sourcetype other than stash will take license usage. So, I should be able to create a custom field extraction for the stash source file paths without taking any license.

I did a ./splunk btool props list splunkd and this is what it shows.

[splunkd] ADD_EXTRA_TIME_FIELDS = True ANNOTATE_PUNCT = True AUTO_KV_JSON = true BREAK_ONLY_BEFORE = BREAK_ONLY_BEFORE_DATE = True CHARSET = UTF-8 DATETIME_CONFIG = /etc/datetime.xml DEPTH_LIMIT = 1000 DETERMINE_TIMESTAMP_DATE_WITH_SYSTEM_TIME = false EXTRACT-fields = (?i)^(?:[^ ]* ){2}(?:[+\-]\d+ )?(?P<log_level>[^ ]*)\s+(?P<component>[^ ]+) - (?P<event_message>.+) HEADER_MODE = LB_CHUNK_BREAKER_TRUNCATE = 2000000 LEARN_MODEL = true LEARN_SOURCETYPE = true LINE_BREAKER_LOOKBEHIND = 100 MATCH_LIMIT = 100000 MAX_DAYS_AGO = 2000 MAX_DAYS_HENCE = 2 MAX_DIFF_SECS_AGO = 3600 MAX_DIFF_SECS_HENCE = 604800 MAX_EVENTS = 256 MAX_TIMESTAMP_LOOKAHEAD = 40 MUST_BREAK_AFTER = MUST_NOT_BREAK_AFTER = MUST_NOT_BREAK_BEFORE = SEGMENTATION = indexing SEGMENTATION-all = full SEGMENTATION-inner = inner SEGMENTATION-outer = outer SEGMENTATION-raw = none SEGMENTATION-standard = standard SHOULD_LINEMERGE = false TIME_FORMAT = %m-%d-%Y %H:%M:%S.%l %z TRANSFORMS = TRUNCATE = 20000 detect_trailing_nulls = false maxDist = 100 priority = sourcetype = termFrequencyWeightedDist = false

for default stanza, it shows :

[default] ADD_EXTRA_TIME_FIELDS = True ANNOTATE_PUNCT = True AUTO_KV_JSON = true BREAK_ONLY_BEFORE = BREAK_ONLY_BEFORE_DATE = True CHARSET = UTF-8 DATETIME_CONFIG = /etc/datetime.xml DEPTH_LIMIT = 1000 DETERMINE_TIMESTAMP_DATE_WITH_SYSTEM_TIME = false HEADER_MODE = LB_CHUNK_BREAKER_TRUNCATE = 2000000 LEARN_MODEL = true LEARN_SOURCETYPE = true LINE_BREAKER_LOOKBEHIND = 100 MATCH_LIMIT = 100000 MAX_DAYS_AGO = 2000 MAX_DAYS_HENCE = 2 MAX_DIFF_SECS_AGO = 3600 MAX_DIFF_SECS_HENCE = 604800 MAX_EVENTS = 256 MAX_TIMESTAMP_LOOKAHEAD = 128 MUST_BREAK_AFTER = MUST_NOT_BREAK_AFTER = MUST_NOT_BREAK_BEFORE = SEGMENTATION = indexing SEGMENTATION-all = full SEGMENTATION-inner = inner SEGMENTATION-outer = outer SEGMENTATION-raw = none SEGMENTATION-standard = standard SHOULD_LINEMERGE = True TRANSFORMS = TRUNCATE = 10000 detect_trailing_nulls = false maxDist = 100 priority = sourcetype = termFrequencyWeightedDist = false

I verified the data and it is not in json format. So, AUTO_KV_JSON would not apply to it.

The only thing I could find in transforms and props.conf which separate fields based upon "=" is

[ad-kv] CAN_OPTIMIZE = True CLEAN_KEYS = True DEFAULT_VALUE = DEPTH_LIMIT = 1000 DEST_KEY = FORMAT = KEEP_EMPTY_VALS = False LOOKAHEAD = 4096 MATCH_LIMIT = 100000 MV_ADD = true REGEX = (?<_KEY_1>[\w-]+)=(?<_VAL_1>[^\r\n]*) SOURCE_KEY = _raw WRITE_META = False

which is being called by

[ActiveDirectory] SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]+---splunk-admon-end-of-event---\r\n[\r\n]*) EXTRACT-GUID = (?i)(?!=\w)(?:objectguid|guid)\s*=\s*(?<guid_lookup>[\w\-]+) EXTRACT-SID = objectSid\s*=\s*(?<sid_lookup>\S+) REPORT-MESSAGE = ad-kv # some schema AD events may be very long MAX_EVENTS = 10000 TRUNCATE = 100000

Re: How does splunk extract search time fields in "interesting fields"?

goelt2000 — Sun, 04 Jul 2021 09:04:39 GMT

I gave a try to use rex on sourcetype=stash. It is not working. Even a basic regex is not working. Seems like I will have to change the sourcetype in order to get the interesting fields?

@splunk @richgalloway - would you have any idea? - thanks

Re: How does splunk extract search time fields in "interesting fields"?

richgalloway — Sun, 04 Jul 2021 14:16:05 GMT

What regex did you use in your rex command? I would use the expression in the EXTRACT-fields attribute from props.conf then add more rex commands to extract more fields.

Stepping back, what problem are you trying to solve by copying data between indexes?

Re: How does splunk extract search time fields in "interesting fields"?

goelt2000 — Mon, 05 Jul 2021 03:43:37 GMT

it was a simple regex. regex was not the issue. as the same regex worked with other sourcetypes, but not stash. These commands worked for me. I am still figuring it out how to retain the original host, source, sourcetype

| extract auto=t

I wanted to merge data from one index into another for a use case. My understanding is collect command does the work. It is also documented here.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect

Copying events to a different index You can use the collect command to copy search results to another index. Construct a search that returns the data you want to copy, and pipe the results to the collect command. For example: index=foo | ... | collect index=bar This search writes the results into the bar index. The sourcetype is changed to stash. You can specify a sourcetype with the collect command. However, specifying a sourcetype counts against your license, as if you indexed the data again.

We can probably keep the original sourcetypes and host, and source values too. But license usage will become an issue since the amount of data is in TBs. I think I saw a thread about how you can append source, sourcetype, host to _raw.

I am still looking for it.

| eval _raw=_raw.orig_host=$host..orig_source=$source

Once it is done, I can use the destination index like

index=destinationindex|eval host=orig_host ...|extract auto=t

and I will have a backup index data without consuming more license usage.

It also says: data is stored under:

The file that is written to the var/spool/splunk path ends in .stash_hec instead of .stash.

while the saved results from normal searches are stored under

var/run/splunk/dispatch.

So, splunk should not replicate the artifacts from spool/splunk to other search head cluster members. I can test it out though. So, that should rule out the results getting replicated across search peers and having duplicate events?

If I do schedule a search with collect command, for this use case, should it be run in fast mode, or verbose mode? or it doesn't matter, most likely scheduled searches always run in fast mode? and where will the results from scheduled search for a collect command get stored, under dispatch or spool?

What fields does collect command collect from source index?

thanks

Re: How does splunk extract search time fields in "interesting fields"?

richgalloway — Tue, 06 Jul 2021 00:25:00 GMT

That's a lot of work to create a backup of an index. Splunk has a document describing how to back up indexed data. See https://docs.splunk.com/Documentation/Splunk/8.2.1/Indexer/Backupindexeddata

Another way to protect your data is via replication done by an indexer cluster. See https://docs.splunk.com/Documentation/Splunk/8.2.1/Indexer/Aboutclusters