Splunk Search

Group and count totals on 3 or more levels

dustinhartje
Explorer

Hello fellow Splunkers!

I'm trying to recreate an existing report for my firewall guy within Splunk with hopes of replacing the system he's currently using. I'm a bit stuck though as he wants to break down traffic and get counts on 3 levels based on top talkers so we have a total per SourceIP, a total per DestinationIP hitting each SourceIP, and finally a total per Status hitting each SourceIP/DestinationIP combo. Something like so:

SourceIPDestinationIPHTTPStatusNumRequests
10.11.12.136000
 
2.3.4.54000
 200800
 404200
 
3.4.5.62000
 2001500
 404500
11.12.13.145000
 
4.5.6.7300
 200200
 40430
 40320
and so on...



The log data is pretty straightforward. Each event has data in the 3 fields SourceIP, DestinationIP, and HTTPStatus, and NumRequests is just counting them. Something like so:

[2014-08-08 04:17:22.000 -0700] Type="WebRequest" SourceIP="1.2.3.4" DestinationIP="2.3.4.5" HTTPStatus="200" Bytes="12341" SourcePort="65432" DestinationPort="80"

I can get the raw data so each SourceIP/DestinationIP/HTTPStatus is totalled up fairly easily with

| stats count by SourceIP, DestinationIP, HTTPStatus

but getting beyond that to human readable with totals at each level is eluding me.

I have been able to group on two levels (one parent, and then count by each child "category") using stats and appendpipe but can't seem to find or figure out a method for doing this on multiple levels. I also tried to get there in Pivot with no further luck, but I have almost no experience with the Pivot interface so I may just be missing something. The data doesn't have to be laid out exactly like above, just so long as it totals up each level and provides a reasonably human readable format.

Anyone out there have any idea how this might be accomplished?

Thanks!

Tags (1)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Ah, this is the perfect use case for this (still) undocumented command, multireport:

sourcetype=myhttpdata
| multireport
  [ stats count by SourceIP ]
  [ stats count by SourceIP DestinationIP ]
  [ stats count by SourceIP DestinationIP HTTPStatus ]

This gives you the data you need.

Now, you need to sort the rows into the order you'd like better. This kind of works:

sourcetype=myhttpdata
| multireport
  [ stats count by SourceIP ]
  [ stats count by SourceIP DestinationIP ]
  [ stats count by SourceIP DestinationIP HTTPStatus ]
| sort SourceIP DestinationIP HTTPStatus

But that puts the summary rows for SourceIP and Destination IP at the end of their categories. To move them to the beginning, you need to do something hacky:

sourcetype=myhttpdata
| multireport
  [ stats count by SourceIP ]
  [ stats count by SourceIP DestinationIP ]
  [ stats count by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| sort SourceIP DestinationIP HTTPStatus

And then if you want you can do an "eval" to convert the "_ANY" back to an empty string or null. But that gives you things in lexicographic order. What if you want them in order of most common SourceIP, then most common DestinationIP? Then you'll have to do:

sourcetype=myhttpdata
| multireport
  [ stats count as by SourceIP | eval count_sip=count ]
  [ stats count as by SourceIP DestinationIP | eval count_sip_dip=count ]
  [ stats count as by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| eventstats first(eval(coalesce(count_sip,null()))) as count_sip by SourceIP
| eventstats first(eval(coalesce(count_sip_dip,null()))) as count_sip_dip by SourceIP DestinationIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
| fields - count_sip count_sip_dip

Now, this gives similar results (but not the same, it lacks summary rows and instead uses different columns for the summary values):

sourcetype=myhttpdata
| eventstats count as count_sip by SourceIP
| eventstats count as count_sip_dip by SourceIP DestinationIP
| stats count first(count_sip) as count_sip
              first(count_sip_dip) as count_sip_dip 
  by SourceIP DestinationIP HTTPStatus
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count

but likely runs a lot slower. You could also use:

sourcetype=myhttpdata
| stats count by SourceIP DestinationIP HTTPStatus
| eventstats sum(count) as count_sip_dip by SourceIP DestinationIP
| eventstats sum(count_sip_dip) as count_sip by SourceIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count

which will be a lot better, but for large data sets will still be slower than the version using multireport.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

Ah, this is the perfect use case for this (still) undocumented command, multireport:

sourcetype=myhttpdata
| multireport
  [ stats count by SourceIP ]
  [ stats count by SourceIP DestinationIP ]
  [ stats count by SourceIP DestinationIP HTTPStatus ]

This gives you the data you need.

Now, you need to sort the rows into the order you'd like better. This kind of works:

sourcetype=myhttpdata
| multireport
  [ stats count by SourceIP ]
  [ stats count by SourceIP DestinationIP ]
  [ stats count by SourceIP DestinationIP HTTPStatus ]
| sort SourceIP DestinationIP HTTPStatus

But that puts the summary rows for SourceIP and Destination IP at the end of their categories. To move them to the beginning, you need to do something hacky:

sourcetype=myhttpdata
| multireport
  [ stats count by SourceIP ]
  [ stats count by SourceIP DestinationIP ]
  [ stats count by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| sort SourceIP DestinationIP HTTPStatus

And then if you want you can do an "eval" to convert the "_ANY" back to an empty string or null. But that gives you things in lexicographic order. What if you want them in order of most common SourceIP, then most common DestinationIP? Then you'll have to do:

sourcetype=myhttpdata
| multireport
  [ stats count as by SourceIP | eval count_sip=count ]
  [ stats count as by SourceIP DestinationIP | eval count_sip_dip=count ]
  [ stats count as by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| eventstats first(eval(coalesce(count_sip,null()))) as count_sip by SourceIP
| eventstats first(eval(coalesce(count_sip_dip,null()))) as count_sip_dip by SourceIP DestinationIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
| fields - count_sip count_sip_dip

Now, this gives similar results (but not the same, it lacks summary rows and instead uses different columns for the summary values):

sourcetype=myhttpdata
| eventstats count as count_sip by SourceIP
| eventstats count as count_sip_dip by SourceIP DestinationIP
| stats count first(count_sip) as count_sip
              first(count_sip_dip) as count_sip_dip 
  by SourceIP DestinationIP HTTPStatus
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count

but likely runs a lot slower. You could also use:

sourcetype=myhttpdata
| stats count by SourceIP DestinationIP HTTPStatus
| eventstats sum(count) as count_sip_dip by SourceIP DestinationIP
| eventstats sum(count_sip_dip) as count_sip by SourceIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count

which will be a lot better, but for large data sets will still be slower than the version using multireport.

dustinhartje
Explorer

several usable methods there, thanks gkanapathy! The last one seems to come out the cleanest, I think we'll stick with that. Any long searches can be scheduled overnight so not a big deal on the speed front.

steveyz
Splunk Employee
Splunk Employee

actually the last method is likely to be faster than using multireport. Multireport doesn't attempt to ask the indexers to pre-compute intermediate aggregates for the reports, which will make a big difference for data that is distributed across many indexers.

another method for doing this is by appending an "ALL" value to each DestinationIP and HTTPStatus. E.g.

sourcetype=myhttpdata | eval DestinationIP = mvappend(DestinationIP,"ALL") | eval HTTPStatus=mvappend(HTTPStatus,"ALL") | stats count by SourceIP DestinationIP HTTPStatus

This has the additional benefit that if for some reason there are events that have a SourceIP but no DestinationIP or HTTPStatus, it will still get counted in the total for that SourceIP

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

Not the most elegant. But it does the trick.
I know you'll be able to tweak it to bend it to your will.

Or perhaps this will now inspire one of the smarter Splunkers to show me up! 😉


|stats count by SourceIP DestinationIP HTTPStatus
|stats sum(count) as DestCount list(HTTPStatus) as HTTPStatus list(count) as StatusCount by SourceIP DestinationIP|sort -SourceIP
|appendpipe [stats sum(DestCount) as Total_\For_\SourceIP by SourceIP]
|addtotals row=t col=t labelfield=" " fieldname=DestIPCount DestCount StatusCount label=NumRequests
|fields - DestCount

sort subtotal grandtotal

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

strive
Influencer

I tried for sometime. I could get the output in the format you need but only for a single SourceIP. The moment i included one more SourceIP to the dataset it failed.

By some other approach, the closest that i could get was this
Some Search terms ...| stats count by SourceIP DestinationIP HTTPStatus | stats list(HTTPStatus) as HTTPStatus list(count) as count by SourceIP DestinationIP | sort SourceIP

0 Karma
Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...