Splunk Search

How can limit the number of events shown on a table after using stats list

0p3r4t0r8089
Explorer

I have a report that lists malware received by email that is part of a dashboard. Some months the list for each person can have dozens of events listed. Management would like to only show the latest 5 events for each person. I'm having difficulty finding a good way to accomplish this.

Search:

index="my_index"
[| inputlookup InfoSec-avLookup.csv
| rename emailaddress AS msg.parsedAddresses.to{}] final_module="av" final_action="discard"
| rename msg.parsedAddresses.to{} AS To, envelope.from AS From, msg.header.subject AS Subject, filter.modules.av.virusNames{} AS Virus_Type
| eval Time=strftime(_time,"%H:%M:%S %m/%d/%y")
| stats count, list(From) as From, list(Subject) as Subject, list(Time) as Time, list(Virus_Type) as Virus_Type by To
| search
[| inputlookup InfoSec-avLookup.csv
| rename emailaddress AS To]
| sort -count
| table Time,To,From,Subject,Virus_Type
| head 5

Current Output:

time - user1 - sender1@xyz.com - Subject1 - Virus_A
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_C
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B

time - user2 - sender1@xyz.com - Subject1 - Virus_A
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_C
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B

time - user3 - sender1@xyz.com - Subject1 - Virus_A
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_C
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B

I'd like to limit it to the latest 5 events by user
time - user1 - sender1@xyz.com - Subject1 - Virus_A
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_C
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B

time - user2 - sender1@xyz.com - Subject1 - Virus_A
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_C
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B

time - user3 - sender1@xyz.com - Subject1 - Virus_A
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_C
time -              - sender2@xyz.com - Subject1 - Virus_B
time -              - sender2@xyz.com - Subject1 - Virus_B

Any help greatly appreciated! Thank you!

 

Labels (2)
0 Karma
1 Solution

0p3r4t0r8089
Explorer

Thanks @yuanliu ,

Your search produced the output in the format they are looking for perfectly. As such, I will credit you with the correct answer (for my use). However, when using this same search in a dashboard, it only produces the latest event in the table output. The same query from the search app works just fine, I have no idea why... Any thoughts?

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

There are probably several different possible approaches.

index="my_index"
[| inputlookup InfoSec-avLookup.csv
| rename emailaddress AS msg.parsedAddresses.to{}] final_module="av" final_action="discard"
| rename msg.parsedAddresses.to{} AS To, envelope.from AS From, msg.header.subject AS Subject, filter.modules.av.virusNames{} AS Virus_Type

This part is OK (unless you have too many results from the subsearch; you are aware of the subsearch limitations?) - it will give you a list of matching events.

Now you're doing

| eval Time=strftime(_time,"%H:%M:%S %m/%d/%y")

While in your particular case it might not be that bad, I always advise to (unless you have a very specific use case like filtering by month so you render your timestamp to just month to have something to filter by) leave the _time as it is since it's easier manipulated this way. Just use eval (or even better - fieldformat) at the end of your pipeline for presentations.

| stats count, list(From) as From, list(Subject) as Subject, list(Time) as Time, list(Virus_Type) as Virus_Type by To

Now that's a tricky part - you're doing stats list() over several separate fields. Are you aware that you are creating completely disconnected multivalued fields? If - for any reason - you had an empty Subject in one of your emails, you wouldn't know which email it was from because the values in the multivalued field are "squished" together. I know it's tempting to use multivalued fields to simulate "cell merging" functionality you know from spreadsheets but it's good to know that mechanism has its limitations.

| search
[| inputlookup InfoSec-avLookup.csv
| rename emailaddress AS To]

This part is pointless. You already searched for those addresses (and you're creating a subsearch again).

I'd do it differently. After your initial search I'd do

| eventstats count by To
| sort - count + To - _time
| streamstats count as eventorder by To
| where eventorder<=5
| table _time To From Subject Virus_Type

The eventstats part is needed only if you want to have the users with most matches first. Otherwise just drop the eventstats and remove the first field from the sort command - you'll just have your results sorted alphabetically then.

Now if you want to have your time field called Time, not _time, add

| rename _time as Time
| fieldformat Time=strftime(Time,"%H:%M:%S %m/%d/%y")

And if you don't want to repeat the To values (which I don't recommend because this breaks the logical structure of your data), you can use autoregress or streamstats to copy over the To value from the previous event and in case it's the same as the current one, just blank the existing field. But again - I don't recommend it - it does make the output look "prettier" but it makes it "logically incomplete".

0 Karma

0p3r4t0r8089
Explorer

Thanks, PickleRick. Your advice is accurate but The output is sorting alphabetically and still includes more than 5 events for whatever reason. I've removed the second subsearch as mentioned, I should have caught that (I inherited this report). I'll keep testing your suggestions and see if I can make it do what they want.

0 Karma

0p3r4t0r8089
Explorer

Thanks @yuanliu ,

Your search produced the output in the format they are looking for perfectly. As such, I will credit you with the correct answer (for my use). However, when using this same search in a dashboard, it only produces the latest event in the table output. The same query from the search app works just fine, I have no idea why... Any thoughts?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@0p3r4t0r8089 As a refresher, if you want to give credit as the correct answer, select the most helpful comment as solution, not the comment expressing credit:-)  Karma is also appreciated.

0p3r4t0r8089
Explorer

Yep, sorry about that. It wouldn't let me leave karma on the post I marked correct. Karma given for this one.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

when using this same search in a dashboard, it only produces the latest event in the table output. The same query from the search app works just fine, I have no idea why... Any thoughts?

Are you using Dashboard Studio's chained search? (Or even manually crafting a chained search in Simple XML?)  I recently helped uncover a bug of sort where a subtle, weakly documented optimization feature could make search app search and dashboard panel differ. (Do you lose any information between Chain Searches in Dashboards?)  If this is the case, you will need to make sure any variable used in chained search is preserved before main search ends. (Post a new question if you need help on that so future users can easily find what they need.)

If you are not using chained search, click the magnifying glass ("Open in Search") under the dashboard panel to compare your original search and the one directly from the panel.  There has to be some subtle difference.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Something like

index="my_index"
[| inputlookup InfoSec-avLookup.csv
| rename emailaddress AS msg.parsedAddresses.to{}] final_module="av" final_action="discard"
| rename msg.parsedAddresses.to{} AS To, envelope.from AS From, msg.header.subject AS Subject, filter.modules.av.virusNames{} AS Virus_Type
| eval Time=strftime(_time,"%H:%M:%S %m/%d/%y")
| stats count, list(From) as From, list(Subject) as Subject, list(Time) as Time, list(Virus_Type) as Virus_Type by To
| search
[| inputlookup InfoSec-avLookup.csv
| rename emailaddress AS To]
| sort -count
| table Time,To,From,Subject,Virus_Type
| foreach Time From Subject Virus_Type
    [eval <<FIELD>> = mvindex(<<FIELD>>, 0, 4)]
Tags (1)
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...