I have dense sensor data (~75k events in a 3 week period) from multiple sensors that I would like to correlate to a sparse set of data (~100 events in the same time period). The data span several different sensors.
Sensor data events have these fields:
_time, SensorName=SensorA, Data=394
Sparse data events have these fields:
_time, ImportantProperty=583
Sensor data logs at a specific interval when nothing interesting is happening, or records a value if there is a significant change in the Data value.
What I would like to do is have Splunk tell me what the Data value was for each SensorName at the time that an ImportantProperty event occurred.
In the past, when I was looking at only one sensor, I have used transaction:
(source=densedata AND SensorName=SensorA) OR source=sparsedata | transaction maxspan=10m maxevents=-1 | search ImportantProperty=* | stats avg(Data) avg(ImportantProperty) by _time
However, this doesn't work when multiple SensorNames are involved.
It seems like what I would want to do is subsearch for the sensor data around the time the ImportantProperty events occur, but I can't find any documentation on how I would do that for multiple time ranges (i.e. each of the 100 sparse events) and find all the sensor data events at those times.
Help!
I am constrained by my position and can't show my actual data, only generalized searches like the above.
You need the map
command; it allows you to run an inner subsearch once for every event contained in the outer search and allows you to pass fields from each outer search event into the inner search:
http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/map
This is a job for streamstats ...
source=densedata OR source=sparsedata
| rename COMMENT as "optionally, if there is no SensorName on the sparsedata, then do this"
| eval SensorName=coalesce(SensorName,"sensor1 sensor2 sensor3 sensoretc")
| makemv SensorName
| mvexpand SensorName
| rename COMMENT as "Now we keep a running average of the last ten minutes Data on all sensors"
| sort 0 SensorName _time
| streamstats avg(Data) as avgData time_window=10m by SensorName
| rename COMMENT as "And keep only the sparse events, which now have the running average for Data"
| where source="sparsedata"
| fillnull value="((none))" avgData
You need the map
command; it allows you to run an inner subsearch once for every event contained in the outer search and allows you to pass fields from each outer search event into the inner search:
http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/map
Thanks for the help! The doc is pretty sparse on examples of map. Is it possible to return matching events or more than one row of a table?
My inner search is:
search="search earliest=early_t latest=late_t Sensor=Sensor* | stats avg(Data) by Sensor"
which returns a table like:
Sensor Data
SensorA 39
SensorB 393
SensorC 392
SensorD 494
The doc says that the search is a literal search. Where can I find more info on what that means and any restrictions that might come with it?
Here is how got the output I was looking for:
source=sparsedata | eval earliest_t=_time-300 | eval latest_t=_time+300 | eval midtime=_time+0 | map maxsearches=999 search="search source=densedata earliest=$earliest_t$ latest=$latest_t$ | eval mid_t=$midtime$ | eval ImportantProperty=$ImportantProperty$ | stats avg(Data), values(ImportantProperty), values(mid_t) by Sensor" | table Sensor avg(Data) values(ImportantProperty) values(mid_t)
The important bit I was missing when I posted my first reply was the final table command to get the output in the format I wanted.