Splunk Search

Any way to do this without map?

BG_Splunk
Explorer

Nightly, my organization puts a bunch of pieces of equipment into "maintenance mode" to do repairs and such on them. I've got a data source which records the state of a device (as either on or off) which looks similar to this:

| makeresults
| eval SEED_DATA = 
"4th Floor,Server,A123,ON|".
"4th Floor,Server,A123,OFF|".
"4th Floor,Server,A123,ON|".
"5th Floor,Computer,C234,ON|".
"6th Floor,Printer,M345,OFF|".
"6th Floor,Printer,M345,ON"
| eval SEED_DATA = split(SEED_DATA,"|")
| mvexpand SEED_DATA
| rex field=SEED_DATA "^(?<FLOOR>[^,]+),(?<DEVICE_TYPE>[^,]+),(?<DEVICE>[^,]+),(?<STATE>[^,]+)$"
| fields - SEED_DATA
| table _time FLOOR DEVICE_TYPE DEVICE STATE

 

I'm trying to write a macro that I can apply to any arbitrary log of device data, where the macro appends a TRUE/FALSE value to the current dataset in regards to whether that particular event occurred during a Maintenance period or not. For example, let's assume this is the data set I want to run my macro on:

| makeresults
| fields - _time
| eval SEED_DATA = 
"A123,".round(relative_time(now(),"-6h"),0)."|".
"C234,".round(relative_time(now(),"-8h"),0)."|".
"M345,".round(relative_time(now(),"-10h"),0)."|".
"S456,".round(relative_time(now(),"-12h"),0)."|".
"R567,".round(relative_time(now(),"-14h"),0)."|".
"W678,".round(relative_time(now(),"-16h"),0)
| eval SEED_DATA = split(SEED_DATA,"|")
| mvexpand SEED_DATA
| rex field=SEED_DATA "^(?<ZONE_DATA>[^,]+),(?<LATEST>.*)$"
| fields - SEED_DATA
| eval ZONE = substr(ZONE_DATA,0,1)
| eval EARLIEST = LATEST - 86400
| eval EARLIEST = tostring(strftime(EARLIEST, "%m/%d/%Y:%H:%M:%S")), LATEST = tostring(strftime(LATEST, "%m/%d/%Y:%H:%M:%S"))
| table ZONE_DATA ZONE EARLIEST LATEST

 

And, let's assume this is more or less what the macro looks like and does:

| multireport 
    [table *]
    [map search="search index=ZONE_DATA sourcetype=MAINTENANCE_INFO DEVICE_TYPE=computer DEVICE=\"$zone$*\" earliest=\"$earliest$\" latest=\"$latest$\"
    | fields _time DEVICE STATE
    | table _time DEVICE STATE 
    | sort 0 _time
    | eval COMBINE = _time.\"|\".STATE
    | table DEVICE COMBINE 
    | mvcombine COMBINE 
    | streamstats last(COMBINE) as LAST_COMBINE 
    | rex field=LAST_COMBINE \"^(?<LATEST_TIMESTAMP>[^|]+)\|(?<LATEST_STATUS>[^|]+)$\" 
    | eval STATUS_VALUE = if(LATEST_STATUS==\"ON\",1,0) 
    | eventstats avg(STATUS_VALUE) as PERCENTAGE_ON
    | stats values(PERCENTAGE_ON) as PERCENTAGE_ON
    | eval MAINTENANCE = if(PERCENTAGE_ON > 0.5, \"TRUE\", \"FALSE\")"]
| stats list(*) AS *

 

We can also assume this is more or less what the output looks like:

ZONE_DATAZONEEARLIESTLATESTPERCENTAGE_ONMAINTENANCE
A123
C234
M345
S456
R567
W678
A
C
M
S
R
W
01/22/2026:10:15:09
01/22/2026:08:15:09
01/22/2026:06:15:09
01/22/2026:04:15:09
01/22/2026:02:15:09
01/22/2026:00:15:09
01/23/2026:10:15:09
01/23/2026:08:15:09
01/23/2026:06:15:09
01/23/2026:04:15:09
01/23/2026:02:15:09
01/23/2026:00:15:09
0.9743589743589743
1
1
0.967741935483871
0
0.9047619047619048
FALSE
FALSE
FALSE
FALSE
TRUE
FALSE

 

My question, essentially, is whether I'm going to end up with problems trying to scale out this solution. I'm aware that the map function is pretty resource intensive, but I'm not sure that I have another way of doing this without doing some very complicated work combining the two data streams and trying to figure out some other workaround.

 

I've considered making a summary index of the state of the various zones over time, so that every 5 mins I write a log statement to a summary index which states whether the zone is in maintenance mode or not. I'd still need to use a map command to check what the state of that zone was at the time I'm checking for, but at least the source of data for that map command would be smaller and less resource intensive to check. This might just be pushing the problem down the line, though.

Looking for any other thoughts regarding another way to do this, or some unlikely reassurance that map isn't going to break my environment if used this way.

Labels (2)
Tags (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

To be honest, I found this difficult to understand what it is you are working with (e.g. what do sample events from your data sources look like?), what it is you are trying to achieve (e.g. percentage of events not in "maintenance" periods?), or just advice on whether your environment is big enough to handle the type of search you want to do (e.g. use map).

Please provide some clarification, and some anonymised sample events.

0 Karma

BG_Splunk
Explorer

The first makeresults query in my post is an example of the data in my data source (the logs which state whether any individual device is currently powered on or off) -- we'll call that data source 1. The second makeresults query in my post is an example of a separate data source which I want to run my macro on, which we'll call data source 2. The macro's purpose is to check whether or not each event in the second data source is occurring during a maintenance period, by calculating whether more than 50% of the devices in a given "zone" are powered on or off (using data source 1) at the time of the event from data source 2.

 

Basically, I'm trying infer the state of my system at any given time period by comparing two different data sources. One data source tells me the state of the various devices in each zone. My second data source could be any arbitrary data source which could be recording any other types of transactions, events, or other information throughout the system.

 

My first attempt to solve this involved a lookup table that stored the current state of devices every 15 mins or so, but that only works for queries which need to check the state of events at the current moment. I want to be able to look into historical data and determine whether the system was in a maintenance period or not, on a per line basis. The only way I've been able to think of how to do that is using the map command, which I'm worried could be a problem if I try to search over too many lines of data at once (or if anyone else within my organization who doesn't realize the full extent of what they're doing tries to do so).

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, you might get away with either foreach or some clever mvexpand (mvexpand can be heavy on memory with bigger data sets) but the overall complexity of your problem is X*Y.

0 Karma

BG_Splunk
Explorer

That's pretty much what I was afraid of. I think inherently my problem isn't going to get better, no matter how I try to solve it. I guess the best I can do is try to minimize the size of x and y individually, but the problem is still always going to be x*y. Might slow down the impending issue somewhat, but will always kind of be there. 😕

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The main problem affecting your ability to optimize the problem will be what you want to achieve across your initial "conditions". Unfortunately your example search includes stuff like streamstats so you can't use it with foreach to just generate one big data set and throw foreach on it.

So maybe your specific problem has to be either reformulated or rethough to achieve similar result another way. But it obviously depends on the specific problem, not a generalized idea.

0 Karma

BG_Splunk
Explorer

I'll see if I can get a foreach command to work, but I've never seen a foreach command where I can pipe in data from other indexes and sourcetypes. I'll search around and see what I can find/come up with, though. Thanks for the idea!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No, you can't. Foreach is pretty limited in what it can do. That's why I said that it will probably not be _the_ solution to your problem. If you can somehow do that streamstats over whole big data set and then use some filtering, foreach, lookup, whatever... that's different.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

How to find the worst searches in your Splunk environment and how to fix them

Everyone knows Splunk is a powerful platform for running searches and doing data analytics. Your ...

Share Your Feedback: On Admin Config Service (ACS)!

Help Us Build a Better Admin Config Service Experience (ACS)   We Want Your Feedback on Admin Config Service ...