This question originates from suggestions from this thread: Is it possible to preserve original order of events? It was suggested we use
bucketid+arrival address) to order events by their arrival address, and thus by indexing time. This does not seem to work, meaning the
_cd cannot be used as a regular field.
The only internal fields documented are
_raw and _time.
We are keeping it simple. Consider this set of sample events which all occur in the same second:
2010-09-13 09:45:00 red 1 2010-09-13 09:45:00 blue 2 2010-09-13 09:45:00 red 3 2010-09-13 09:45:00 blue 4 2010-09-13 09:45:00 red 5 2010-09-13 09:45:00 blue 6
Both these searches have no effect on the ordering:
sourcetype="colors" | sort - _cd sourcetype="colors" | sort + _cd
Shouldn't one of the two work? I also tried this sorting on
append operations which the same outcome--no affect on ordering. Is there a special flag which needs to be set when using internal fields? The end goal is to have the ability to perform set/append operations and use search commands and still have things sorted in indexed order.
This leads to curiosity about other useful internal fields. Are there?
Your sort here doesn't work because the _cd field looks like
<bucketid>:<address>, so it's not sortable as a number or a string. You'll have to break it apart, say by
... | rex field=_cd "(?<bucket>\d+):(?<address>\d+)". You can then
| sort + _indextime address to get the data in arrival order. You can't use the bucket here since with multiple hot buckets possible, the bucket id doesn't help much.
Splunk has plenty of other internal fields. Most are used to make sure that enough information is preserved in the search to enable the UI to work as expected for events. Very few internal fields exist after a transforming command (one notable one is the
_span for the result of the timeline command).
This works beautifully. Thank you, Dr. Sorkin! One more question if I may--is _indextime required in the sort operation? I tried the search without it and it appears to output the same result. Is that field required for distributed search?
_indextime is the time of arrival and is the primary sort order. we break ties by the place it was written. When we roll from one bucket to the next, the address will drop back to zero and increase from there. You may actually want to "| sort _indextime bucket address" to better handle the bucket roll.
If you're interested, you can look at the internal search result files in the dispatch directory
$SPLUNK_HOME/var/run/splunk/dispatch/<search id>/results.csv.gz, and take a look at the internal fields for a non-transforming search. These aren't documented as you noted, but you might be able to guess some of them.