Hello Splunkers,
I have static values (user groups) that I need to loop through to produce the results for each of the static values. We then use that to produce averages for the users within those groups and alert on any that are above the average by a small percent. I may be making this more difficult than necessary. While I can hard code the group info, I would prefer to not have 20 identical searches other than the group identifier. My averages are also not returning the desired results, it seems that the average is not carried through all of the data.
Input.csv:
Group,
grp1,
grp2,
grp3,
grp4,
etc...
Hardcoded results:
User #access total group groupavg
Usera 20 45 10
Userb 15
Userc 5
Userd 5
In this case using the variable groupavg
it would only produce the result that usera
was above the average not usera
and userb
. Now if you hardcode the value of 10
it does show both usera
and userb
.
The search:
index="foo" [|inputlookup input.csv |rename Group as query |fields query] Classification=query
|dedup fooKey
| eval Date=strftime(_time, "%D")
| stats count by User
|appendcols [search index="foo" Classification=query
|dedup fooKey
| eval Date=strftime(_time, "%D")
| stats count as total dc(user) as totuser
|eval groupavg=round(total/totuser,0)]
|where count > (groupavg*1.5)
|fields - total, totuser
So to see if I was way off base, I tried a simple search using a lookup file against _internal
Inputs.csv:
Log,
INFO,
WARNING,
WARN,
ERROR,
error,
Search:
index="_internal" [|inputlookup testfile.csv | rename Log AS query | fields query] Log_Level=query |stats count by query
This returned nothing as far as stats
, but it did return events:
12/7/15
2:19:41.176 PM
12-07-2015 14:19:41.176 -0700 INFO StreamedSearch - Streamed search connection terminated: search_id=remote_indexmaster.localdomain_1449523206.47, server=indexmaster.localdomain, active_searches=0, elapsedTime=0.273, search='litsearch index="_internal" ( ( INFO ) OR ( WARNING) OR ( WARN ) OR ( ERROR ) OR ( error ) ) Log_Level=query | addinfo type=count label=prereport_events | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "prestats_reserved_*" "psrsvd_*" "query" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1449508800.000000 lt=1449523206.000000 max_count=1000 max_prefetch=100 | prestats count by query', savedsearch_name=""
So it is returning the fields from the input.csv
, but in the Boolean format ( INFO ) OR ( WARNING) OR ( WARN ) OR ( ERROR ) OR ( error )
I need to be able to extract each field run in a search against those results as stated above.
So what am I doing wrong?... Should I not use an input lookup to loop through the results?
Ideally we would like to see that Usera
who is part of grpd
is 1.5 times higher than the rest of the users in grpd
.
Thank you in advance for any assistance you can provide.
LIke HiroshiSatoh posted though the key with subsearches is to have any output match fields that are going to be in the data from your parent search. One thing you can do (if you don't already know) is add | format
to the end of a search that will be come a subsearch. It will show you what will be passed to the parent.
Additionally I think you will want to be introduced to the eventstats command.
index="foo" [|inputlookup input.csv |rename Group as Classification |fields Classification]
|dedup fooKey
| stats count by User
| eventstats sum(count) as total dc(User) as totuser
|eval groupavg=round(total/totuser,0)]
I've left off your eval as because it is before the stats command and isn't referenced in the stats command it will be dropped. I've also not included the where and fields command as you can always add those back in once you've seen the outcome. The big challenge (to me) is there isn't enough information in what you've provided for us to know if this query will return the results you are looking for. For example - comparing your hardcoded results to your query there isn't a field for #access
, total
, and group
let alone what those numbers mean. What is fooKey
and does it need to be de-duplicated? Perhaps I've not had enough caffeine 😃
LIke HiroshiSatoh posted though the key with subsearches is to have any output match fields that are going to be in the data from your parent search. One thing you can do (if you don't already know) is add | format
to the end of a search that will be come a subsearch. It will show you what will be passed to the parent.
Additionally I think you will want to be introduced to the eventstats command.
index="foo" [|inputlookup input.csv |rename Group as Classification |fields Classification]
|dedup fooKey
| stats count by User
| eventstats sum(count) as total dc(User) as totuser
|eval groupavg=round(total/totuser,0)]
I've left off your eval as because it is before the stats command and isn't referenced in the stats command it will be dropped. I've also not included the where and fields command as you can always add those back in once you've seen the outcome. The big challenge (to me) is there isn't enough information in what you've provided for us to know if this query will return the results you are looking for. For example - comparing your hardcoded results to your query there isn't a field for #access
, total
, and group
let alone what those numbers mean. What is fooKey
and does it need to be de-duplicated? Perhaps I've not had enough caffeine 😃
OK, easy answer first, the dedup is a key value that the customer uses, not important for this issue 😉
Combining HiroshiSatoh and yours I have this (working off of internal for now):
index="_internal" [|inputlookup testfile.csv | rename Log AS log_level| fields log_level]
|stats count by log_level
|eventstats sum(count) as totlog dc(log_level) as totgrp
|eval grpavg=round(totlog/totgrp,0)
This returns the following:
log_level count grpavg totgrp totlog
ERROR 54 10758 3 32275
INFO 31841 10758 3 32275
WARN 380 10758 3 32275
Ok so this returns the information that we can work with, when adding in the where clause:
index="_internal" [|inputlookup testfile.csv | rename Log AS log_level| fields log_level] |stats count by log_level |eventstats sum(count) as totlog dc(log_level) as totgrp|eval grpavg=round(totlog/totgrp,0)|where count > (grpavg*1.5)
INFO is the only returned result - yeah!!! Now to see if this works with the customer data, I'll let you know and accept if so
Got word back from the customer, while it is returning results, it is combining all of the classifications into a group and doing the average on the total for every classification.
The goal is to read the file for classification "A" gather all information from that classification and if a user in that classification is above the average for the group then we want to know about that user in that group, then move to the next classification and do it all over again.
Here is the actual search we ran:
index="usrindex" [|inputlookup classfile.csv | rename class AS Classification| fields Classification]
|dedup UserKey
| eval Date=strftime(_time, "%D")
| stats count by UserID
| eventstats sum(count) as total dc(UserID) as totuser
|eval groupavg=round(total/totuser,0)
|where count > (groupavg*1.5)
So how would I ensure I am only working within each classification NOT all classifications for my averages
Any other suggestions on how to have it loop and print only each classification not all classifications?
Try adding by Classification
to your eventstats. Eventstats works the same as stats in terms of what functions you can do. The main difference is the only fields available after the stats commands are the ones created as part of the functions or included in the 'by'. For example your Date
field isn't available after the stats command.
oh but make sure Classification is part of your stats command as well.
query is a special field name.
|inputlookup testfile.csv | rename Log AS query | fields query OR ( WARNING) OR ( WARN ) OR ( ERROR ) OR ( error )
|inputlookup testfile.csv | rename Log AS Log_Level| fields Log_Level OR ( Log_Level=WARNING) OR ( Log_Level=WARN ) OR ( Log_Level=ERROR ) OR ( Log_Level=error )
Is this okay.
index="_internal" [|inputlookup testfile.csv | rename Log AS query | fields query] Log_Level=query |stats count by query
⇒ index="_internal" [|inputlookup testfile.csv | rename Log AS Log_Level| fields Log_Level] |stats count by Log_Level