Splunk Search

How to search and plot the number of computers that changed from one value to another each hour?

Motivator

Here is an interesting question. I want to plot the number of computers that changed from one value to another each hour. The data may look like the following simplified example:

_time, hostname, state
hour1, computer1, stateA
hour1, computer2, stateA
hour2, computer1, stateA
hour2, computer2, stateB
hour3, computer1, stateA
hour3, computer2, stateA

The timechart would show:

hour1, 2 computers did not change state (no previous hour)
hour2, 1 computer did not change state, and 1 computer changed from A to B
hour2, 1 computer did not change state, and 1 computer changed from B to A

Is there an elegant and efficient approach to this question? I am sure that there are many ways to do this, and that some ways are better than others.

0 Karma

SplunkTrust
SplunkTrust

Try something like this (assuming each computer logs once an hour)

your base search | bucket span=1h _time | stats latest(state) as state by  computer _time | streamstats window=1 current=f values(state) as prevstate by computer | eval Status=if(isnull(prevstate) OR state=prevstate,"No State Change","State Changed") | timechart span=1h count by Status
0 Karma

Motivator

How about this one?:

| loadjob ####.#### | sort 0 _time | streamstats current=f last(state) AS state_last by hostname | eval state_chng=state_last."->".state | timechart dc(hostname) AS hostname_dc by state_chng

This actually captures the direction of the change. I use "last" in streamstats instead of window=1 and "values". Is either more accurate or efficient in your opinion? With a sort on _time to put earlier events first, I believe that "last" grabs the most recent event at each point in the streamstats stream. Does that sound right to you?

0 Karma

SplunkTrust
SplunkTrust

You can drop sort by _time and use latest in streastats instead of last (both does the same thing if your events have field _time). If there are only 2 events per host then your streamstats will work in same way with or without window=1. But if you've repetition of events for each hostname then it advisable to reduce the no of rows processed for each row by streamstats by using window=1, else it takes all the previous events into consideration.

0 Karma

Motivator

Using latest merely takes the latest event in the steam up to the point that it is doing the calculation and does not consider later events that are earlier than the current event in the stream but come after it.

0 Karma

Motivator

The results coming into streamstats are not in time order. sorting appears unavoidable prior to streamstats in order to know which event comes before. There are a great many more than two events per hostname. The goal is to know when a change takes place, in which direction the change occurs. And to contain the number of computers moving in that direction each hour.

0 Karma

SplunkTrust
SplunkTrust

In that case you can keep the sorting. Could you try with adding window=1 and see if your result changes. My guess is that it would not.

0 Karma

Motivator

Adding window=1 messes up the timechart that follows so that many results are not shown. The table before time chart looks exactly the same both with and without "window=1", so I don't understand why timechart does not show those results with "window=1".

0 Karma