Reporting

Pivot with top distinct values? Maybe a nested pivot?

Path Finder

I have a pivot set up as follows:

  • Data source myfirewall

  • Rows split on source_ip

  • Columns include bytes_downloaded and outside_hosts.

I'd like to only return the top say 5 or 10 outside_hosts along with their associated bytes_downloaded. How can I do so? I'm guessing it is somewhere in the manual but I'm getting lost in the terminology.

EDIT to answer @mattness:

Yes, I'm hoping to do this through the GUI. I've written nested searches and the like before and I can go back to doing so here if needed. Hoping not though. I have dozens of varying searches similar to this and it would be extremely useful to allow some of my other coworkers to make their own.

  • outside_hosts is a string field containing a domain name or an IP address.
  • source_ip is a string field containing a source IP address (inside the firewall)
  • bytes_downloaded is a numeric field representing the number of bytes downloaded in a given session

Each source_ip can map to one or more outside_hosts.

What I need to do is make a chart of the top 10 source_ip entries based on their sum(bytes_downloaded). Once that is completed to make the results readable I need a way to only show the top 20 outside_hosts associated with that given source_ip.

Using pseudocode for the last bit of logic:

top(sum(`bytes_downloaded` split by `outside_hosts`) ) limit 20. 
Tags (2)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

Ok, I'm a bit unclear about a few things:

  • How have gotten your column fields into the pivot table--are they Split Row elements or Column Value elements?
  • What is the nature of the column_hosts field? Is it a string type field? Or is it a numeric field--a count of hosts, perhaps?
  • Does each source_ip value correspond to a distinct/different outside_hosts value? If the answer to that last question is 'no' you may want to consider making outside_hosts the first Split Row element, and move source_ip to second place.

In any case, here's a guess at a solution. Note that this assumes you're using the Pivot Builder interface, not the pivot command.

  1. Add outside_hosts as a second Split Row element.
  2. Set up bytes_downloaded as a Column Value element. When you do this, make sure Value is set to Sum. The resulting column will be labeled Sum of Bytes_Downloaded.
  3. Now create a second Filter element. Select outside_hosts as the attribute. Filter Type: Limit. Limit By: bytes_downloaded. Limit: Highest 10 sums.

This should give you a table that shows the highest 10 outside hosts by sum of bytes downloaded. You can click the top of the Sum of bytes_downloaded column to sort it from highest to lowest.

If this doesn't work for you, I probably need answers to the questions I asked at the top of my reply.

View solution in original post

Splunk Employee
Splunk Employee

Ok, I'm a bit unclear about a few things:

  • How have gotten your column fields into the pivot table--are they Split Row elements or Column Value elements?
  • What is the nature of the column_hosts field? Is it a string type field? Or is it a numeric field--a count of hosts, perhaps?
  • Does each source_ip value correspond to a distinct/different outside_hosts value? If the answer to that last question is 'no' you may want to consider making outside_hosts the first Split Row element, and move source_ip to second place.

In any case, here's a guess at a solution. Note that this assumes you're using the Pivot Builder interface, not the pivot command.

  1. Add outside_hosts as a second Split Row element.
  2. Set up bytes_downloaded as a Column Value element. When you do this, make sure Value is set to Sum. The resulting column will be labeled Sum of Bytes_Downloaded.
  3. Now create a second Filter element. Select outside_hosts as the attribute. Filter Type: Limit. Limit By: bytes_downloaded. Limit: Highest 10 sums.

This should give you a table that shows the highest 10 outside hosts by sum of bytes downloaded. You can click the top of the Sum of bytes_downloaded column to sort it from highest to lowest.

If this doesn't work for you, I probably need answers to the questions I asked at the top of my reply.

View solution in original post

Path Finder

Thanks @mattness. I figured that the nested searching might be beyond the pivot GUI at current. If that isn't already on the Splunk roadmap and hopefully close it really aught to be. Half of my searches are set up this way and a way to allow nesting from the pivot interface (as well as cross reference data sources easily) would both be absolutely killer features.

0 Karma

Splunk Employee
Splunk Employee

You could set up a third filter (after the time range, and the "Highest 10 Sums" filter) that matches a selected source_ip value. But it will only return values that are already in the first top 10 list--you won't get 20 values, in other words.

0 Karma

Splunk Employee
Splunk Employee

Ok, the second search wasn't part of the original scenario, so that complicates things, as does the fact that each source_ip can have multiple outside_hosts. The procedure I outlined should work for the first search, but I don't know at the moment if there's an easy way to fulfill the second "to make results readable" search. You say that you need a way to "only show the top 20 outside_hosts associated with that given source_ip." Did you mean to say "a given source_ip? " If not, how do you arrive at "that" source_ip value?

0 Karma

Path Finder

Thanks. I've updated my question with (I think) all the relevant answers to your questions. I'm thinking that using a second filter here would apply at the top level, not nested within a first set of criteria.

0 Karma