I need to get historical logs from splunk between a time interval more specifically between two dates. When I do not provide a filter explicitly in my UI, I need an appropriate filter added to my backend API query which should fetch all logs between these dates. Currently I use the filter: index=_*. This works fine for real time log ingestion but in case of historical data ingestion, my logs always show the line: "500000 logs cumulated". Is there a limit of 500000 by default. Also is this the correct filter because I also tried with filter : _* and it showed a different number. Also even if there is a limit of 500000, when I cross-checked in the dashboard for a particular time range it had fewer that 500000 events, still my logs showed 500000 logs cumulated. It would be really helpful If you could provide an appropriate answer for my query.
Cheers!
It would help to know which API call you're using for the query. A quick search of the docs shows the saved/searches endpoint as a dispatch.max_count setting that defaults to 500,000, but that seems an unlikely cause in this context.
Why are you including index=_* in the query? That searches all of Splunk's internal logs rather than the data you have submitted to Splunk. For faster performance, wildcard filters, like index=_* and index=*, should be avoided in favor of specific index names, like index=_internal or index=main.
Further, real-time searches should be avoided whenever possible (and it's almost always possible). A real-time ties up a CPU on the search head and each indexer, preventing other searches from using them.
It would help to know which API call you're using for the query. A quick search of the docs shows the saved/searches endpoint as a dispatch.max_count setting that defaults to 500,000, but that seems an unlikely cause in this context.
Why are you including index=_* in the query? That searches all of Splunk's internal logs rather than the data you have submitted to Splunk. For faster performance, wildcard filters, like index=_* and index=*, should be avoided in favor of specific index names, like index=_internal or index=main.
Further, real-time searches should be avoided whenever possible (and it's almost always possible). A real-time ties up a CPU on the search head and each indexer, preventing other searches from using them.
Can you explain why you need to use index=_* instead of simply specify search interval? What is your method to utilize index=_*? I assume that by "backend API query" you mean a query submitted to Splunk's REST API. Is this correct?
Yes, by API query I meant the REST API. So initially the query did not have a filter. In that case the real time logs were not ingesting so I had to add the filter index=_* which resulted in successful ingestion of real time logs.
OK. It's all very confusing. Let's take a step back.
1. What are you doing? Exactly? You're mentioning REST but are you using REST to search for data or to ingest the data into Splunk? What endpoint are you using and with what arguments? (anonymized if necessary)
2. What is the expected result and what is the actual result?