We need to know when is the first occurrence of a certain value, and show a list of items that appeared last week.
Our approach is like this: create a list of those values (and some related fields), get the min(_time) for every row, and filter out those older than a week.
This is the search:
index=SOMETHING earliest=-12month "Message with the required value" | rex "SOMETHING: (?<Id>.*) SOMETHING" | rex "SOMETHING: (?<RelatedValue>.*) SOMETHING" | stats min(_time) as minTime max(RelatedValue) as RelatedValue by Id | eval diffTime=(now()-minTime)/60/60/24 | where diffTime<7 | convert ctime(minTime) AS c_time | table c_time, RelatedValue, Id | sort c_time
But we have a problem. There are rows whose min(_time) should be older than a week but the results are limited to 10000, and when that row count is reached it seems like splunk stops analyzing or something like that.
The problem is it seems that the limit of 10000 results is affecting our search. When the results reach 10000 it keeps searching (i.e. the flashtimeline does change) but the results are freezed.
I'll add an example:
| ctime__________| RelatedValue | Id |
| 2011/05/13 | Value1 | 13 |
| 2011/05/14 | Value1 | 14 |
| 2011/05/14 | Value2 | 07 |
Everyone of those ID's can appear on most recent dates, but I need the first occurrence, the min(_time) splunk can find that specific Id.
You can increase the limit of certain reporting/searching parameters by editing the limits.conf file. This might possibly be your problem if you are indeed always hitting a hard limit.
You can get the first value of a field by using the first stat command.
Thanks for your time Simeon. Check the example I added.
I'll try to explain the use case. We have log entries with some information about requests, the sender is identified by Id, and there's some related data (let's say a color). If we search for Id and Color we will find a huge list of pairs, with repeated rows because of different requests. I need to know when was the first occurrence of every Id, so I use min(_time) for that. After I know when was the first request, I need to filter the list to see the newest Ids (last week's).
Sounds like you should create a summary index that stores the occurrences. From there, you can then search the summary index over the time period you want while filtering on certain values and pulling the _time field. Maybe you can be more specific about the use case as that will give me a better picture of your final goal.
Hi Simeon. In my example, the first command will return the most recent Id (and the "last" would return the oldest Id). But I need the time (and related values) of the first occurrence of every Id. The list I want would look like:
|2011/05/15 - 12 - RelatedValue1
|2011/05/16 - 15 - RelatedValue2