I have two indexes that I want to create a summary from every hour.
Index1
request_type, request_guid, request_timestamp, meta_field1, meta_field2, ...
Index1 contains log entries from each processing steps in each service request. Each service request is assigned a unique request_guid and all ~10 logs for the processing of a request have that id. The time the request was made is stored in request_timestamp and also remains the same through all logs for a request..
index2
request_guid, meta_fieldA, meta_fieldB, ...
index2 contains more data for the logs, but is in a separate index so that it can be secured differently from index1. The request_guid is the same value as in index1
I want to summarize by collecting stats for each request type by hour.
The approach I have taken is to select all the logs from Index1 where the request_timestamp is in the hour. I cannot use the log time directly as a request logs might span into the next hour ( as in started at 9:59:59 and ended at 10:00:01)
index=index1 earliest=0
| addinfo | eval timemillis=strftime(strptime(request_timestamp,"%Y-%m-%dT%H:%M:%S.%3N%z"),"%s")
| where timemillis>=info_min_time AND timemillis<info_max_time
This gives me a list of all the logs from index1 that I want to summarize. I then join the logs from index2
| join usetime=false type=left overwrite=false request_guid max=0
[search index=index2]
I then create a summary with the following
| sort _time desc
| transaction request_guid maxspan=10m mvlist=true
| stats
count as requests_in_period
by request_type
What results is elevated counts per period. They all are multiples of 2, which tells me that a cross product seems to be happening. I think the problem is in the join command. There are multiple entries selected from index1 that have duplicate values for request_guid that I think are over selecting from index2, yet I want all the matching entries from index2 with the request_guid.
When I view the events right after the join, all the logs are from index1 and none are from index2
Background
I started with a subselect of the request_guids that should be summarized, but a subselect is limited to 10500 entries, which will not be enough for an hours of transactions logs.
Try something like this. This would run every 30 minutes, at x:15 and x:45. We're summing at the 15m level, and doing so for two periods, -45m to -30m and -30m to -15m past.
earliest=-45m@m latest=@m index=index1 OR index=index2
| rename COMMENT as "get rid of all unneeded fields"
| fields index request_guid request_timestamp, request_type
| rename COMMENT as "calculate desired timestamp from the index1 record, roll it and the request type from the index2 record onto the index1 records for each guid"
| eval mytime=strptime(request_timestamp,"%Y-%m-%dT%H:%M:%S.%3N%z")
| eventstats min(mytime) as mytime min(request_type) as mytype by request_guid
| rename COMMENT as "get rid of unneeded index2 records, and records that are too old or too young to have settled"
| addinfo
| where (mytime>=info_min_time) AND (mytime<info_max_time-900) AND (index="index1")
| rename COMMENT as "chunk up the records into 15 minute chunks, stats together all records for each guid, then find out how many unique guids there were in each period."
| eval _time=900*floor(mytime/900)
| stats min(mytype) as request_type by _time request_guid
| stats dc(request_guid) as requests_in_period by _time request_type
Thanks for the response. You inspired me to look into using stats more instead of a join and that proved very effective and fast.