Splunk Enterprise Security

In Splunk Enterprise Security, how come the Incident review dashboard isn't returning events intermittently?

Splunk Employee
Splunk Employee

In the Splunk incident review dashboard, when the customer is clicking on the submit button, they can see the event count at the top. But instead of events in the result, Splunk is showing "Search did not return any events" (This issue is intermittent)
Please check the following screenshot:

alt text

Checklist:

Scenario 1:
1. To check whether it's a Splunk issue or an issue with the browser, we tried different browsers and we reproduced the same issue in another browser.
2. Also, when one user faced the issue, we requested other users to log in to Splunk and check Splunk's behavior. We found that all users faced the same issue at the same time.
3. Here, we confirmed that the issue is not related to browser/cache/machine oriented.

Scenario 2:
5. We could be able to reproduce the issue for the last 24 hrs without applying any extra filter. (means, with default settings of incident review panel) where 124 events were present. But in the output, we got "search did not return any events" message.
6. With the help of job inspector, we checked the search query running behind this incident review panel. And tried to run the SPL, in the search and reporting app, and we saw 124 events.
7. We checked this behavior after a few minutes (around after 10 minutes of issue occurrence). So, the time range was different and the issue did not occur.

1 Solution

Splunk Employee
Splunk Employee

This issue is reported as a bug in splunk, (SPL-153621). and the cause of the issue is as follows:

Here is the search that reproduces this scenario. Note that the early time and later time which is a huge range(2016 to 2999 which is 983 years span):

06-13-2018 23:35:12.267 INFO SearchParser - PARSING: litsearch (sourcetype="timeliner" _time>=1467743400 _time<=32503573800) | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server"

That is the exact early/later time value the Timerliner receives during search started:

06-13-2018 23:35:12.231 DEBUG TimelineCreator - Creating timeline with info._tz=default dir=/Users/ABCD/test/splunk/var/run/splunk/dispatch/1528957677.2 bucket=300 max_per_bucket=1000 tl_lt=32503573801.000000 tl_et=1467743400.000000 events_colorder={} isEventsSorted=1 info._realtime=0 info._timeline_events_preview=0

Timeliner decides the bucket spans to be in terms of 1 year, due to the huge span of the supplied early/later time values. And it goes on to create the buckets with 1-year span in descending order of time(from the year 2999) and comparing with the event results that are in descending order as well for efficient search. However, there is a limit on the number of buckets(300) it can create that is dictated by "status_buckets" passed down to the search. This is evident from the request.csv file:

rf,"auto_cancel","status_buckets",.........
"*",30,300,..............

Hence when all the buckets are created, the earliest time is reset to 300 years(300 buckets) less than the later time(the year 2999). Due to this the earliest time the result event that could be processed is that belongs to the year 2699 which is way off than the event's time. Hence we see the Timeliner error:

06-13-2018 23:35:17.320 ERROR Timeliner - Unexpected error, possibly due to mismatch in event ordering expectations. Returning count:0

Here is how the timeline.cvs file looks like with 301 buckets:
et,lt,count,available,complete
23005065600.000,23036601600.000,0,0,1
23036601600.000,23068137600.000,0,0,1

Looking into the working & non-working dispatch folder, and the main difference noted is the et/lt values of the search vs the timeliner.

alt text

Note: The search et & lt values in the non-working case is higher than the time liner et/lt values by 30m round-off. This seems to be the reason for the failure to list/show the events in the UI due to Timeliner error.

Workaround:
To set TimeZone in the user preference to "- Default System Timezone --"
Bug Fix: Splunk V6.6.12

View solution in original post

Splunk Employee
Splunk Employee

This issue is reported as a bug in splunk, (SPL-153621). and the cause of the issue is as follows:

Here is the search that reproduces this scenario. Note that the early time and later time which is a huge range(2016 to 2999 which is 983 years span):

06-13-2018 23:35:12.267 INFO SearchParser - PARSING: litsearch (sourcetype="timeliner" _time>=1467743400 _time<=32503573800) | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server"

That is the exact early/later time value the Timerliner receives during search started:

06-13-2018 23:35:12.231 DEBUG TimelineCreator - Creating timeline with info._tz=default dir=/Users/ABCD/test/splunk/var/run/splunk/dispatch/1528957677.2 bucket=300 max_per_bucket=1000 tl_lt=32503573801.000000 tl_et=1467743400.000000 events_colorder={} isEventsSorted=1 info._realtime=0 info._timeline_events_preview=0

Timeliner decides the bucket spans to be in terms of 1 year, due to the huge span of the supplied early/later time values. And it goes on to create the buckets with 1-year span in descending order of time(from the year 2999) and comparing with the event results that are in descending order as well for efficient search. However, there is a limit on the number of buckets(300) it can create that is dictated by "status_buckets" passed down to the search. This is evident from the request.csv file:

rf,"auto_cancel","status_buckets",.........
"*",30,300,..............

Hence when all the buckets are created, the earliest time is reset to 300 years(300 buckets) less than the later time(the year 2999). Due to this the earliest time the result event that could be processed is that belongs to the year 2699 which is way off than the event's time. Hence we see the Timeliner error:

06-13-2018 23:35:17.320 ERROR Timeliner - Unexpected error, possibly due to mismatch in event ordering expectations. Returning count:0

Here is how the timeline.cvs file looks like with 301 buckets:
et,lt,count,available,complete
23005065600.000,23036601600.000,0,0,1
23036601600.000,23068137600.000,0,0,1

Looking into the working & non-working dispatch folder, and the main difference noted is the et/lt values of the search vs the timeliner.

alt text

Note: The search et & lt values in the non-working case is higher than the time liner et/lt values by 30m round-off. This seems to be the reason for the failure to list/show the events in the UI due to Timeliner error.

Workaround:
To set TimeZone in the user preference to "- Default System Timezone --"
Bug Fix: Splunk V6.6.12

View solution in original post