Archive
Highlighted

How to create a normalized histogram?

Explorer

The data: The simplest example is signal strength per station attached to wireless access point. Every 5 minutes, signal strength from all attached stations is recorded. Use 3db buckets.

I can create a histogram of ALL signal strengths:

index=* sourcetype=my-csv | chart count over Signal span=3

AND I can create a histogram that breaks up the data by StationName (each bin has one bar per StationName):

index=* sourcetype=my-csv |chart count over Signal span=3 by StationName

Both of these work fine, but what I need is to NORMALIZE the data for each station.

Right now, the bar height is the total count of signal strengths that fall into that bucket. That is fine, except some stations have more connect time than others and the ABSOLUTE count is actually unimportant, it is the RELATIVE count I am after.

What I need to do is divide each of the "count" metrics for a given station by the total of all counts for that station (NOT the total count overall, just the total for that station). But I have not been able to figure out how to do this.

Example data:

Name         Signal    count      Name      Signal   count
station1       -28       2      station2       -28  3
station1       -29     9        station2       -29  8
station1       -30    33        station2       -30  8
station1       -31    73        station2       -31  11
station1       -32   189        station2       -32  7
station1       -33   604        station2       -33  15
station1       -34   504        station2       -34  16
station1       -35    23        station2       -35  3
station1       -36    10        station2       -36  1
station1       -37    13        station2       -37  1
station1       -38     3        station2       -38  1
station1       -39      4       station2       -39  2
station1       -40     2        station2       -40  1
station1       -41     1        station2       -41  1
station1       -42     1        station2       -42  5

sum of "count" (total measurements) for station1=1471
sum of "count" (total measurements) for station2=83

What I want in the histogram is something like this where each "count" above is divided by the total for its station:

Name          Signal %ofTotal        Name   Signal  %ofTotal
station1       -28  0.1%        station2       -28  3.6%
station1       -29  0.6%        station2       -29  9.6%
station1       -30  2.2%        station2       -30  9.6%
station1       -31  5.0%        station2       -31  13.3%
station1       -32  12.8%      station2    -32  8.4%
station1       -33  41.1%      station2    -33  18.1%
station1       -34  34.3%      station2    -34  19.3%
station1       -35  1.6%        station2       -35  3.6%
station1       -36  0.7%        station2       -36  1.2%
station1       -37  0.9%        station2       -37  1.2%
station1       -38  0.2%        station2       -38  1.2%
station1       -39  0.3%        station2       -39  2.4%
station1       -40  0.1%        station2       -40  1.2%
station1       -41  0.1%        station2       -41  1.2%
station1       -42  0.1%        station2       -42  6.0%

I am relatively new to Splunk....I have searched but will feel duly humbled if someone posts a link with a "did you see this example of exactly what you are trying to do?" 🙂

Tags (2)
0 Karma
Highlighted

Re: How to create a normalized histogram?

Contributor

Eventstats is useful for this scenario

Your query would be something like

<base query here> | eventstats sum(count) as SumofCount by Signal | eval PercofCount = (count/SumofCount)*100
0 Karma
Highlighted

Re: How to create a normalized histogram?

Influencer

index=* sourcetype=my-csv | chart count over Signal | eventstats sum(count) as TotalStationCount by Name | eval PercofCount = (count/TotalStationCount)*100

View solution in original post

Highlighted

Re: How to create a normalized histogram?

Explorer

I am going to accept this answer as it got me closer to my goal, but I had to do some major reworking to get it to work.

The query above does not apper to give me any different results that what I get without the eventstats commands.

However, if I do the eventstats BEFORE the chart command and chart the correct value, I get what I was looking for.

Here is my final query:

index=* sourcetype=my-csv  | 
stats count(Signal) as bySignal by StationName,Signal | 
eventstats sum(bySignal) as SumofCount by StationName |  
eval PercofCount = (bySignal/SumofCount)*100 | 
chart sum(PercofCount) over Signal span=3 by StationName

If you do a "table StationName,Signal,SumofCount,PercofCount" instead of the chart command you will see what is happening.

One note: "Signal" is not continuous, but is a range of whole integers from 0 to -101.

Basically, eventstats added the "SumofCount" metric to each line of the base table (where it is count per Signal value) just as advertised. Then you can do the eval statement to get the percentage for each Signal value for each StationName, and finally chart the SUM of the PercofCount for Signal binned in 3db bins.

Thanks!