Hi,
I would like query all data over the past year and then use "stats count by some fields" to calculate the counts.
However, the data is too large (at least a few millions) and Splunk truncates data when querying, so the number of counts is inaccurate.
Does anyone know a good way to fix it?
PS. I tried 'sistats' and set a report run every hour to query data from the previous year.
Ideally, I hope the report can collect data in a smaller time interval accurately, and the aggregate the result.
However, in each hour, the report query the whole previous data inaccurately and then added up all counts as the result.
I think you have several options.
Number one being the easiest approach. Number 2 being a faster approach. Number three being necessary if you need to correlate data from more than one really large data set.
Are you referring to the number of rows getting truncated?
If so, I had a simialr problem a while back where it would truncate anything more than 50,000 rows and lead to inaccurate results. Luckily this is a simple fix to limits.conf
maxresultrows = <integer>
* Configures the maximum number of events are generated by search commands which
grow the size of your result set (such as multikv) or that create events. Other search commands are explicitly
controlled in specific stanzas below.
* This limit should not exceed 50000. Setting this limit higher than 50000 causes instability.
* Defaults to 50000.
http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/Limitsconf
Thank you skoelpin!
This is one possible solution for me. In this case, because increasing the limit might cause some instability, do you happen to know other possible methods?
Are the number of rows getting truncated after 50k? If so then this may be your only solution
I've increased the limit before and haven't seen any instability issues. I would contact support and get their opinion before trying this in production
The limit is there to protect your browser from locking up (amoung other reasons... or at least that's what I believe). When you load than much into memory things can get funny. "Unstable" even!
can you provide the original query that ended up being truncated as well as what query you're using to try summary indexing? replace any sensitive information. This will help the community answer your question more accurately.
Hi, Thanks for reminding me. The code is here:
The code to create summary indexing report:
sourcetype=my_source event_id =*
| sistats count by event_id field1 field2
The name of the report is "my_report_name."
The code to retrieve the result:
index=summary search_name="my_report_name"
|stats count by event_id field1 field2