I'm hoping to get a single summary index query that I can then use to pull data in different ways. I would prefer to roll the data up daily but there are about 150 million events in a day. Normally that wouldn't be an issue but I'm also wanting to group the data by lots of different fields like this:
index=cif
| fields ApplicationName, DataCenter, Environment, ServerType, host, ErrorCode, MessageText, _time
| eval dateOnly = strftime(_time, "%x")
| fields dateOnly, ApplicationName, DataCenter, Environment, ServerType, host, ErrorCode, MessageText
| fillnull value=""
| stats count as messageCount by dateOnly, ApplicationName, DataCenter, Environment, ServerType, host, ErrorCode, MessageText
The goal is to count the number of times a particular message occurs. On the backside, when this summary is done the user would select the data back like this:
index=summary source=mySource ApplicationName=foo DataCenter=foo Environment=bar ServerType=bar host=*
| stats count as by dateOnly
On retrieval the user will know the various filter fields which is a much smaller set of data. So if I group by the filter fields when building the summary index then I can use them to filter later. I like that this gets me a single summary index job but the query takes like 2.5 hours to complete.
Am I better off running more summary jobs and filtering up front?? Will mean more Summary Index sources and more jobs, which is annoying but maybe necessary?? Thanks.
... View more