I need to provide feedback on ways logging formats could be improved.
To that end, I'm trying to create a search that ends with:
| stats values(source) values(_raw) by index sourcetypeso I get some examples of logs, but I only want to see a max of 5 values in the source and _raw columns.
I tried using foreach with append, but append isn't streaming, so I manually created 204 lines like this:
index=index1 sourcetype=sourcetype1 | head 5
| append [ search index=index1 sourcetype=sourcetype2 | head 5 ]
| append [ search index=index2 sourcetype=sourcetype1 | head 5 ]
...It took a long time in "Parsing job...", but eventually produced the results I wanted.
What are some different ways of getting this result?
Ouch. So many subsearches. No wonder it takes forever to run (and might produce wrong/incomplete results).
This is actually one of the relatively few legitimate uses of the dedup command
index IN (index1, index2, ...) sourcetype IN (sourcetype1, sourcetype2,...)
| dedup 5 index sourcetype
this works great, but over lots of indexes over hours (some are infrequent log sources) it takes a long time, so I shortened the time to 15-minutes and it ran in a few minutes, thank you!
That's true. Dedup works on the results but first it has to get those results so over a long time span it will be a relatively "heavy" command. If you can safely assume that all your results are contained within a certain time range from the latest event you could cheat a little by creating the search results dynamically.
Normally you could do something like this
| tstats max(_time) as latest where index IN (...) sourcetype IN (...) by index sourcetype
to find latest event time for each sourcetype/index.
Now if you can safely assume that all interesting events are within a certain range (let's say - within 5 minutes from the latest event), you could use this as a subsearch (but be aware of the subsearch limitations and be aware that it might return incomplete results in some cases!) to narrow down your initial search criteria
[ | tstats max(_time) as latest where index IN (...) sourcetype IN (...) by index sourcetype
| eval earliest=latest-300 ]
| dedup 5 index sourcetype
This trick will make Splunk only look within latest 5 minutes for each index/sourcetype combination.
And - as far as I remember - it does not work if you want to use tstats. It only works with normal search.
Ouch. So many subsearches. No wonder it takes forever to run (and might produce wrong/incomplete results).
This is actually one of the relatively few legitimate uses of the dedup command
index IN (index1, index2, ...) sourcetype IN (sourcetype1, sourcetype2,...)
| dedup 5 index sourcetype