Hello,
I am running a query to analyse 1 year of data and find out the number of users that used the application per day. But the below query is getting timeout and terminated with the error "unexpected error"
index=myIndex | dedup user_id _time | timechart span=1d dc(user_id) as Users | *outploutlookup ysers.csv
Could you please help with optimizing the above query ?
I see you are running the 'Dedup' command on a large data set with a huge time range. This is directly impacting your search performance and the query fails. Never run 'Dedup' Command directly over a search string. When you run a Dedup Command the text of every event in memory is retained which impacts your search performance.
If you run this search for a short time duration it might work and produce results. But if you run the same search over larger time duration such as 1 year, it will require to retain the text for a long time in the memory and eventually search will fail to complete.
This is the nature of the Dedup Command and this can not be an error. The dedup command is a streaming command or a dataset processing command, depending on which arguments are specified with the command.
To fix this, You have to modify your search to restrict only limited dataset to be pulled out. There are multiple ways to modify your search based on your data and make the search fast.
I see you are running the 'Dedup' command on a large data set with a huge time range. This is directly impacting your search performance and the query fails. Never run 'Dedup' Command directly over a search string. When you run a Dedup Command the text of every event in memory is retained which impacts your search performance.
If you run this search for a short time duration it might work and produce results. But if you run the same search over larger time duration such as 1 year, it will require to retain the text for a long time in the memory and eventually search will fail to complete.
This is the nature of the Dedup Command and this can not be an error. The dedup command is a streaming command or a dataset processing command, depending on which arguments are specified with the command.
To fix this, You have to modify your search to restrict only limited dataset to be pulled out. There are multiple ways to modify your search based on your data and make the search fast.
You don't need that dedup command in there.
@akasthi
Does your search work fine with different time range? like last 7 Days, last 30 days, last 2months, etc
Yes, it works for the fewer time range, say 30 days, 7 days, etc
Can you please inspect Job for that??
Run the search.
From the Job menu, select Inspect Job.
https://docs.splunk.com/Documentation/Splunk/7.2.5/Search/ViewsearchjobpropertieswiththeJobInspector