Splunk Search

Loop through lookuptable to produce averages

mikev
Path Finder

Hello Splunkers,

I have static values (user groups) that I need to loop through to produce the results for each of the static values. We then use that to produce averages for the users within those groups and alert on any that are above the average by a small percent. I may be making this more difficult than necessary. While I can hard code the group info, I would prefer to not have 20 identical searches other than the group identifier. My averages are also not returning the desired results, it seems that the average is not carried through all of the data.

Input.csv:

Group,
grp1,
grp2,
grp3,
grp4,
etc...

Hardcoded results:

User       #access      total group      groupavg
Usera          20                45            10
Userb          15                                        
Userc           5
Userd           5

In this case using the variable groupavg it would only produce the result that usera was above the average not usera and userb. Now if you hardcode the value of 10 it does show both usera and userb.

The search:

index="foo" [|inputlookup  input.csv |rename Group as query |fields query] Classification=query
|dedup fooKey
| eval Date=strftime(_time, "%D")
| stats count by User
|appendcols [search index="foo"  Classification=query
|dedup fooKey
| eval Date=strftime(_time, "%D")
| stats count as total dc(user) as totuser
|eval groupavg=round(total/totuser,0)]
 |where count > (groupavg*1.5)
|fields - total, totuser

So to see if I was way off base, I tried a simple search using a lookup file against _internal

Inputs.csv:

Log,
INFO,
WARNING,
WARN,
ERROR,
error,

Search:

index="_internal" [|inputlookup testfile.csv | rename Log AS query | fields query] Log_Level=query |stats count by query

This returned nothing as far as stats, but it did return events:

12/7/15
2:19:41.176 PM  
12-07-2015 14:19:41.176 -0700 INFO  StreamedSearch - Streamed search connection terminated: search_id=remote_indexmaster.localdomain_1449523206.47, server=indexmaster.localdomain, active_searches=0, elapsedTime=0.273, search='litsearch index="_internal" ( ( INFO ) OR ( WARNING) OR ( WARN ) OR ( ERROR ) OR ( error ) ) Log_Level=query | addinfo  type=count label=prereport_events | fields  keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "prestats_reserved_*" "psrsvd_*" "query" "source" "sourcetype" "splunk_server"  | remotetl  nb=300 et=1449508800.000000 lt=1449523206.000000 max_count=1000 max_prefetch=100 | prestats  count by query', savedsearch_name=""

So it is returning the fields from the input.csv, but in the Boolean format ( INFO ) OR ( WARNING) OR ( WARN ) OR ( ERROR ) OR ( error )

I need to be able to extract each field run in a search against those results as stated above.

So what am I doing wrong?... Should I not use an input lookup to loop through the results?

Ideally we would like to see that Usera who is part of grpd is 1.5 times higher than the rest of the users in grpd.

Thank you in advance for any assistance you can provide.

0 Karma
1 Solution

Runals
Motivator

LIke HiroshiSatoh posted though the key with subsearches is to have any output match fields that are going to be in the data from your parent search. One thing you can do (if you don't already know) is add | format to the end of a search that will be come a subsearch. It will show you what will be passed to the parent.

Additionally I think you will want to be introduced to the eventstats command.

index="foo" [|inputlookup  input.csv |rename Group as Classification |fields Classification] 
 |dedup fooKey
 | stats count by User
 | eventstats sum(count) as total dc(User) as totuser
 |eval groupavg=round(total/totuser,0)]

I've left off your eval as because it is before the stats command and isn't referenced in the stats command it will be dropped. I've also not included the where and fields command as you can always add those back in once you've seen the outcome. The big challenge (to me) is there isn't enough information in what you've provided for us to know if this query will return the results you are looking for. For example - comparing your hardcoded results to your query there isn't a field for #access, total, and group let alone what those numbers mean. What is fooKey and does it need to be de-duplicated? Perhaps I've not had enough caffeine 😃

View solution in original post

0 Karma

Runals
Motivator

LIke HiroshiSatoh posted though the key with subsearches is to have any output match fields that are going to be in the data from your parent search. One thing you can do (if you don't already know) is add | format to the end of a search that will be come a subsearch. It will show you what will be passed to the parent.

Additionally I think you will want to be introduced to the eventstats command.

index="foo" [|inputlookup  input.csv |rename Group as Classification |fields Classification] 
 |dedup fooKey
 | stats count by User
 | eventstats sum(count) as total dc(User) as totuser
 |eval groupavg=round(total/totuser,0)]

I've left off your eval as because it is before the stats command and isn't referenced in the stats command it will be dropped. I've also not included the where and fields command as you can always add those back in once you've seen the outcome. The big challenge (to me) is there isn't enough information in what you've provided for us to know if this query will return the results you are looking for. For example - comparing your hardcoded results to your query there isn't a field for #access, total, and group let alone what those numbers mean. What is fooKey and does it need to be de-duplicated? Perhaps I've not had enough caffeine 😃

0 Karma

mikev
Path Finder

OK, easy answer first, the dedup is a key value that the customer uses, not important for this issue 😉

Combining HiroshiSatoh and yours I have this (working off of internal for now):
index="_internal" [|inputlookup testfile.csv | rename Log AS log_level| fields log_level]
|stats count by log_level
|eventstats sum(count) as totlog dc(log_level) as totgrp
|eval grpavg=round(totlog/totgrp,0)

This returns the following:

log_level count grpavg totgrp totlog
ERROR 54 10758 3 32275
INFO 31841 10758 3 32275
WARN 380 10758 3 32275

Ok so this returns the information that we can work with, when adding in the where clause:
index="_internal" [|inputlookup testfile.csv | rename Log AS log_level| fields log_level] |stats count by log_level |eventstats sum(count) as totlog dc(log_level) as totgrp|eval grpavg=round(totlog/totgrp,0)|where count > (grpavg*1.5)

INFO is the only returned result - yeah!!! Now to see if this works with the customer data, I'll let you know and accept if so

Got word back from the customer, while it is returning results, it is combining all of the classifications into a group and doing the average on the total for every classification.

The goal is to read the file for classification "A" gather all information from that classification and if a user in that classification is above the average for the group then we want to know about that user in that group, then move to the next classification and do it all over again.

Here is the actual search we ran:

index="usrindex" [|inputlookup classfile.csv | rename class AS Classification| fields Classification]
|dedup UserKey
| eval Date=strftime(_time, "%D")
| stats count by UserID
| eventstats sum(count) as total dc(UserID) as totuser
|eval groupavg=round(total/totuser,0)
|where count > (groupavg*1.5)

So how would I ensure I am only working within each classification NOT all classifications for my averages

0 Karma

mikev
Path Finder

Any other suggestions on how to have it loop and print only each classification not all classifications?

0 Karma

Runals
Motivator

Try adding by Classification to your eventstats. Eventstats works the same as stats in terms of what functions you can do. The main difference is the only fields available after the stats commands are the ones created as part of the functions or included in the 'by'. For example your Date field isn't available after the stats command.

0 Karma

Runals
Motivator

oh but make sure Classification is part of your stats command as well.

0 Karma

HiroshiSatoh
Champion

query is a special field name.

|inputlookup testfile.csv | rename Log AS query | fields query OR ( WARNING) OR ( WARN ) OR ( ERROR ) OR ( error )

|inputlookup testfile.csv | rename Log AS Log_Level| fields Log_Level OR ( Log_Level=WARNING) OR ( Log_Level=WARN ) OR ( Log_Level=ERROR ) OR ( Log_Level=error )

Is this okay.
index="_internal" [|inputlookup testfile.csv | rename Log AS query | fields query] Log_Level=query |stats count by query
⇒ index="_internal" [|inputlookup testfile.csv | rename Log AS Log_Level| fields Log_Level] |stats count by Log_Level

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...