Splunk Search

How to correlate Dense Sensor Data with a Sparse Data Set?

Communicator

I have dense sensor data (~75k events in a 3 week period) from multiple sensors that I would like to correlate to a sparse set of data (~100 events in the same time period). The data span several different sensors.

Sensor data events have these fields:
_time, SensorName=SensorA, Data=394

Sparse data events have these fields:
_time, ImportantProperty=583

Sensor data logs at a specific interval when nothing interesting is happening, or records a value if there is a significant change in the Data value.

What I would like to do is have Splunk tell me what the Data value was for each SensorName at the time that an ImportantProperty event occurred.

In the past, when I was looking at only one sensor, I have used transaction:

(source=densedata AND SensorName=SensorA) OR source=sparsedata | transaction maxspan=10m maxevents=-1 | search ImportantProperty=* | stats avg(Data) avg(ImportantProperty) by _time

However, this doesn't work when multiple SensorNames are involved.

It seems like what I would want to do is subsearch for the sensor data around the time the ImportantProperty events occur, but I can't find any documentation on how I would do that for multiple time ranges (i.e. each of the 100 sparse events) and find all the sensor data events at those times.

Help!

I am constrained by my position and can't show my actual data, only generalized searches like the above.

0 Karma
1 Solution

Esteemed Legend

You need the map command; it allows you to run an inner subsearch once for every event contained in the outer search and allows you to pass fields from each outer search event into the inner search:

http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/map

View solution in original post

SplunkTrust
SplunkTrust

This is a job for streamstats ...

source=densedata OR source=sparsedata 

| rename COMMENT as "optionally, if there is no SensorName on the sparsedata, then do this"
| eval SensorName=coalesce(SensorName,"sensor1 sensor2 sensor3 sensoretc")
| makemv SensorName    
| mvexpand SensorName

| rename COMMENT as "Now we keep a running average of the last ten minutes Data on all sensors"
| sort 0 SensorName _time
| streamstats avg(Data) as avgData time_window=10m by SensorName

| rename COMMENT as "And keep only the sparse events, which now have the running average for Data"
| where source="sparsedata"
| fillnull value="((none))" avgData
0 Karma

Esteemed Legend

You need the map command; it allows you to run an inner subsearch once for every event contained in the outer search and allows you to pass fields from each outer search event into the inner search:

http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/map

View solution in original post

Communicator

Thanks for the help! The doc is pretty sparse on examples of map. Is it possible to return matching events or more than one row of a table?

My inner search is:

search="search earliest=early_t latest=late_t Sensor=Sensor* | stats avg(Data) by Sensor"

which returns a table like:

Sensor      Data 
SensorA   39
SensorB   393
SensorC   392
SensorD  494 

The doc says that the search is a literal search. Where can I find more info on what that means and any restrictions that might come with it?

0 Karma

Communicator

Here is how got the output I was looking for:

source=sparsedata | eval earliest_t=_time-300 | eval latest_t=_time+300 | eval midtime=_time+0 | map maxsearches=999 search="search source=densedata earliest=$earliest_t$ latest=$latest_t$ | eval mid_t=$midtime$ | eval ImportantProperty=$ImportantProperty$ | stats avg(Data), values(ImportantProperty), values(mid_t) by Sensor" | table Sensor avg(Data) values(ImportantProperty) values(mid_t) 

The important bit I was missing when I posted my first reply was the final table command to get the output in the format I wanted.

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes and swag!