Splunk Search

What are my searches finding?

Freddler
Explorer

Hi! This is my first post here so I apologise if the format is flawed. I am a Splunk novice and have been tinkering with different searches to try and achieve my goal. I have two searches which both yield something. The results are similar but not identical and I'd like to understand what the difference is.

I'll begin describing the logfiles:

Among loads of other data, there is information about different systems and different programs being run. It is structured somthing like this:

 

 

Lorem ipsum ..
System: ABC (occurs only once)
Lorem ipsum ..
Starting program: Name (can occur multiple times)
Lorem ipsum ..

 

 

The objective is to find and count all programs that have been run in the same session as a certain system is present. For example, if the system can be either ABC, DEF or GHI, I want all the program names from the logs where the system was GHI but NOT if it was DEF or ABC. There are field extractions for both the system name and the program name.

My searches:

 

 

sourcetype=Log "Starting program:" OR "System:"
| stats values(Program) as Program values(System) as System by source
| where System="GHI"
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50
sourcetype=Log "Starting program" OR "System:"
| transaction source startswith="System: GHI" endswith="Starting program:"
| stats values(Program) as Program values(System) as System by source
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50

 

 

 As I've stated, these searches seem to yield same-ish results. What is actually happening under the hood here? Also, the number of occurrences of "Starting program" is vastly superior to the "System", so I imagine that the speed could be increased by first finding the logs where the system is "GHI" and then only in those logs perform the program search. Would that be possible?

Thanks in advance!

Labels (2)
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

You have to remember what the pipe "|" is doing. In the case of

| stats values(Program) as Program values(System) as System by source
| where System="GHI"

stats collects values of Program into a multi-value field also called Program and values of System into a multi-value field called System where each set comes from the same source i.e. your log. At this point, each event has a mv for Program, a mv for System and a source field.

These events are piped into the where statement, which only passes on events with System equal to GHI or more strictly speaking where one of the values in System is GHI. But we know from your data that System only appear once in each source, so we effectively filtering out all the logs which aren't produced by GHI

If you switch these around, the events are filtered by System equal to GHI which means none of the Program starting events get through to the stats, which is why you get no results

When you change the command to "stats count by Program, source", the counts are for unique combinations of Program and source, so if GHI only runs a program once, all the counts will be 1. Otherwise, the count will be the number of times GHI ran that program in that source (log) 

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust
sourcetype=Log "Starting program:" OR "System:"
| stats values(Program) as Program values(System) as System by source

Will give you a list of all the Programs and all the Systems that ran on each source

SourceSystemProgram
srcAsysAprogA
  progB
  progC
 sysBprogD
  progE
srcBsysAprogA
  progB
  progC

When you filter by system, you will still get all the programs and systems that ran on the source where that system executed. For example, where System=sysB

SourceSystemProgram
srcAsysAprogA
  progB
  progC
 sysBprogD
  progE

From your description, this doesn't sound like what you are after?

Freddler
Explorer

You're right, it doesn't sound like what I'm after. What I want is the count of all programs (progA .. progN) that ran on, for instance, sysA across all sources (srcA .. srcN). In any given source sys can be either A or B (or C or D..)  but several cannot occur. The total count of the red progs in your table is what I'm after.  How would I achieve that?

 

SourceSystemProgram
srcAsysAprogA
  progB
  progC
 sysB cannot occur in same log as sysAprogD
  progE
srcBsysAprogA
  progB
  progC

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Would this scenario be valid

sourceSystemStarting program
srcAsysAprogA
  progB
  progC
 sysAprogD
  progE
srcBsysAprogA
  progA
  progD
srcCsysBprogA
  progB
  progE

That is, for any given source, only one system runs on it / produces logs?
Or is it more like each source is the log of an execution run on a system, which starts a number of programs?

Can a system start a program more than once in a run, and if so, does this only count as 1 program or many?

Freddler
Explorer

Yes, except there is no need to declare sysA twice in srcA - the first one remains valid throughout the entire log. Other than that it looks accurate. Yes, a system can start the same prog multiple times in a run. I think both cases would be interesting: counting only unique runs or counting every run.

 

sourceSystemStarting program
srcAsysAprogA
  progB
  progC
  progD
  progE
srcBsysAprogA
  progA
  progD
srcCsysBprogA
  progB
  progE


Example results:
Total count:
progA: 3
progB: 1
progC: 1
progD: 2
progE: 1

Unique count:
progA: 2
progB: 1
progC: 1
progD: 2
progE: 1

Thank you for your patience 🙂

ITWhisperer
SplunkTrust
SplunkTrust
sourcetype=Log "Starting program:" OR "System:"
| stats values(Program) as Program values(System) as System by source
| where System="GHI"
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50

Doesn't this give Program counts of 1 since values(Program) dedup's

sourcetype=Log "Starting program" OR "System:"
| transaction source startswith="System: GHI" endswith="Starting program:"
| stats values(Program) as Program values(System) as System by source
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50

Same thing about values dedup'ing. However, transaction may also be limiting the number of events it processes, so if your logs are over say 5000 lines, the counts may be lower.

Try this

sourcetype=Log "Starting program:" OR "System:"
| stats count values(System) as System by source, Program
| where System="GHI"
| stats sum(count) as total by Program

Freddler
Explorer

Both my searches return not-unreasonable values larger than 1, but the search using transaction does indeed give lower values for some of the programs, so that may be as you describe it.

Your search finds many events but does not yield any stats.

The way I understand my search is this:

 

sourcetype=Log "Starting program:" OR "System:"

 

.. yields two separate sets of events. One set where the events contain "Starting program" and another set where the events contain "System". 

| stats values(Program) as Program values(System) as System by source
| where System="GHI"

.. somehow, the union of the two sets are calculated. If these two lines change place with eachother, no stats are found. If "by source" is removed and nothing else changed, I do find a bunch of irrelevant programs all with the count of 1. My intuition is that "by source" connects the programs to the systems, and thus when we filter the systems using "where", the programs also get filtered along with them. I have no idea if this is actually the case though.

| stats count by Program
| sort count by Program desc

.. the result here looks like:
progA 33
progB 27
progC 21
which is the desired format but I don't know if the numbers are correct. If I change the line to "stats count by Program, source" the results will instead look like this:
progA srcA 1
progA srcB 1
progA srcC 1
progB srcD 1 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You have to remember what the pipe "|" is doing. In the case of

| stats values(Program) as Program values(System) as System by source
| where System="GHI"

stats collects values of Program into a multi-value field also called Program and values of System into a multi-value field called System where each set comes from the same source i.e. your log. At this point, each event has a mv for Program, a mv for System and a source field.

These events are piped into the where statement, which only passes on events with System equal to GHI or more strictly speaking where one of the values in System is GHI. But we know from your data that System only appear once in each source, so we effectively filtering out all the logs which aren't produced by GHI

If you switch these around, the events are filtered by System equal to GHI which means none of the Program starting events get through to the stats, which is why you get no results

When you change the command to "stats count by Program, source", the counts are for unique combinations of Program and source, so if GHI only runs a program once, all the counts will be 1. Otherwise, the count will be the number of times GHI ran that program in that source (log) 

Freddler
Explorer

Aah, that helped alot! Just by removing the "stats count" at the end I can actually see what you are describing. Each resulting stat has a source and the System GHI, and anything from 0 to multiple programs, like this:

srcAGHI 
srcBGHIprogA
srcCGHI

progA
progB
progC

 

That does indeed look like what I am after, assuming 'stats count by program' will return
progA=2
progB=1
progC=1

Does this seem right to you? If so case closed, I guess!

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Yes, the stats count by Program will give you the counts you want

Freddler
Explorer

Thank you very much for your help! I feel like my understanding has deepened a bit.

0 Karma
Get Updates on the Splunk Community!

Meet Duke Cyberwalker | A hero’s journey with Splunk

We like to say, the lightsaber is to Luke as Splunk is to Duke. Curious yet? Then read Eric Fusilero’s latest ...

The Future of Splunk Search is Here - See What’s New!

We’re excited to introduce two powerful new search features, now generally available for Splunk Cloud Platform ...

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...