Any ideas on how to pull a random sample for the logging application that spans the full month and does not specify sources or source types? We’re trying to make this generic enough that it can be applied to any system that starts logging to scan samples of whatever raw data they’ve logged. The query that has been used historically is only pulling the first 25 of the last time items were logged:
index=co_lob co_id=app1 co_env=prod | head 25 | stats latest(_time) as latestinput, latest(source) as source, latest(_raw) as latestraw, count by host, index, co_id, sourcetype, co_env | convert timeformat="%Y-%m-%d %H:%M:%S" ctime(latestinput) AS latestinput | eval application="app1" | table application, count, host, index, latestinput, latestraw, source, sourcetype, co_id, co_env
I found the information on random() and tried:
index=co_lob co_id=app1 co_env=prod | eval rand=random() % 50 | head 50
and was going to go from there to extract into the right table format for the scanning, but even just running for the week to date it times out. Trying to get a random 50 or 100 from across an entire month. Using the Event Sampling doesn’t work because even if I go 1 : 100,000,000, for some of these applications that are logging millions of transactions an hour, it’s causing performance issues and is too much for review.
Thank you in advance for any guidance 🙂
... View more