Hi! This is my first post here so I apologise if the format is flawed. I am a Splunk novice and have been tinkering with different searches to try and achieve my goal. I have two searches which both yield something. The results are similar but not identical and I'd like to understand what the difference is.
I'll begin describing the logfiles:
Among loads of other data, there is information about different systems and different programs being run. It is structured somthing like this:
Lorem ipsum ..
System: ABC (occurs only once)
Lorem ipsum ..
Starting program: Name (can occur multiple times)
Lorem ipsum ..
The objective is to find and count all programs that have been run in the same session as a certain system is present. For example, if the system can be either ABC, DEF or GHI, I want all the program names from the logs where the system was GHI but NOT if it was DEF or ABC. There are field extractions for both the system name and the program name.
My searches:
sourcetype=Log "Starting program:" OR "System:"
| stats values(Program) as Program values(System) as System by source
| where System="GHI"
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50
sourcetype=Log "Starting program" OR "System:"
| transaction source startswith="System: GHI" endswith="Starting program:"
| stats values(Program) as Program values(System) as System by source
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50
As I've stated, these searches seem to yield same-ish results. What is actually happening under the hood here? Also, the number of occurrences of "Starting program" is vastly superior to the "System", so I imagine that the speed could be increased by first finding the logs where the system is "GHI" and then only in those logs perform the program search. Would that be possible?
Thanks in advance!
You have to remember what the pipe "|" is doing. In the case of
| stats values(Program) as Program values(System) as System by source
| where System="GHI"
stats collects values of Program into a multi-value field also called Program and values of System into a multi-value field called System where each set comes from the same source i.e. your log. At this point, each event has a mv for Program, a mv for System and a source field.
These events are piped into the where statement, which only passes on events with System equal to GHI or more strictly speaking where one of the values in System is GHI. But we know from your data that System only appear once in each source, so we effectively filtering out all the logs which aren't produced by GHI
If you switch these around, the events are filtered by System equal to GHI which means none of the Program starting events get through to the stats, which is why you get no results
When you change the command to "stats count by Program, source", the counts are for unique combinations of Program and source, so if GHI only runs a program once, all the counts will be 1. Otherwise, the count will be the number of times GHI ran that program in that source (log)
sourcetype=Log "Starting program:" OR "System:"
| stats values(Program) as Program values(System) as System by source
Will give you a list of all the Programs and all the Systems that ran on each source
Source | System | Program |
srcA | sysA | progA |
progB | ||
progC | ||
sysB | progD | |
progE | ||
srcB | sysA | progA |
progB | ||
progC |
When you filter by system, you will still get all the programs and systems that ran on the source where that system executed. For example, where System=sysB
Source | System | Program |
srcA | sysA | progA |
progB | ||
progC | ||
sysB | progD | |
progE |
From your description, this doesn't sound like what you are after?
You're right, it doesn't sound like what I'm after. What I want is the count of all programs (progA .. progN) that ran on, for instance, sysA across all sources (srcA .. srcN). In any given source sys can be either A or B (or C or D..) but several cannot occur. The total count of the red progs in your table is what I'm after. How would I achieve that?
Source | System | Program |
srcA | sysA | progA |
progB | ||
progC | ||
progD | ||
progE | ||
srcB | sysA | progA |
progB | ||
progC |
Would this scenario be valid
source | System | Starting program |
srcA | sysA | progA |
progB | ||
progC | ||
sysA | progD | |
progE | ||
srcB | sysA | progA |
progA | ||
progD | ||
srcC | sysB | progA |
progB | ||
progE |
That is, for any given source, only one system runs on it / produces logs?
Or is it more like each source is the log of an execution run on a system, which starts a number of programs?
Can a system start a program more than once in a run, and if so, does this only count as 1 program or many?
Yes, except there is no need to declare sysA twice in srcA - the first one remains valid throughout the entire log. Other than that it looks accurate. Yes, a system can start the same prog multiple times in a run. I think both cases would be interesting: counting only unique runs or counting every run.
source | System | Starting program |
srcA | sysA | progA |
progB | ||
progC | ||
progD | ||
progE | ||
srcB | sysA | progA |
progA | ||
progD | ||
srcC | sysB | progA |
progB | ||
progE |
Example results:
Total count:
progA: 3
progB: 1
progC: 1
progD: 2
progE: 1
Unique count:
progA: 2
progB: 1
progC: 1
progD: 2
progE: 1
Thank you for your patience 🙂
sourcetype=Log "Starting program:" OR "System:"
| stats values(Program) as Program values(System) as System by source
| where System="GHI"
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50
Doesn't this give Program counts of 1 since values(Program) dedup's
sourcetype=Log "Starting program" OR "System:"
| transaction source startswith="System: GHI" endswith="Starting program:"
| stats values(Program) as Program values(System) as System by source
| mvexpand Program
| stats count by Program
| sort count by Program desc
| head 50
Same thing about values dedup'ing. However, transaction may also be limiting the number of events it processes, so if your logs are over say 5000 lines, the counts may be lower.
Try this
sourcetype=Log "Starting program:" OR "System:"
| stats count values(System) as System by source, Program
| where System="GHI"
| stats sum(count) as total by Program
Both my searches return not-unreasonable values larger than 1, but the search using transaction does indeed give lower values for some of the programs, so that may be as you describe it.
Your search finds many events but does not yield any stats.
The way I understand my search is this:
sourcetype=Log "Starting program:" OR "System:"
.. yields two separate sets of events. One set where the events contain "Starting program" and another set where the events contain "System".
| stats values(Program) as Program values(System) as System by source | where System="GHI"
.. somehow, the union of the two sets are calculated. If these two lines change place with eachother, no stats are found. If "by source" is removed and nothing else changed, I do find a bunch of irrelevant programs all with the count of 1. My intuition is that "by source" connects the programs to the systems, and thus when we filter the systems using "where", the programs also get filtered along with them. I have no idea if this is actually the case though.
| stats count by Program | sort count by Program desc
.. the result here looks like:
progA 33
progB 27
progC 21
which is the desired format but I don't know if the numbers are correct. If I change the line to "stats count by Program, source" the results will instead look like this:
progA srcA 1
progA srcB 1
progA srcC 1
progB srcD 1
You have to remember what the pipe "|" is doing. In the case of
| stats values(Program) as Program values(System) as System by source
| where System="GHI"
stats collects values of Program into a multi-value field also called Program and values of System into a multi-value field called System where each set comes from the same source i.e. your log. At this point, each event has a mv for Program, a mv for System and a source field.
These events are piped into the where statement, which only passes on events with System equal to GHI or more strictly speaking where one of the values in System is GHI. But we know from your data that System only appear once in each source, so we effectively filtering out all the logs which aren't produced by GHI
If you switch these around, the events are filtered by System equal to GHI which means none of the Program starting events get through to the stats, which is why you get no results
When you change the command to "stats count by Program, source", the counts are for unique combinations of Program and source, so if GHI only runs a program once, all the counts will be 1. Otherwise, the count will be the number of times GHI ran that program in that source (log)
Aah, that helped alot! Just by removing the "stats count" at the end I can actually see what you are describing. Each resulting stat has a source and the System GHI, and anything from 0 to multiple programs, like this:
srcA | GHI | |
srcB | GHI | progA |
srcC | GHI | progA |
That does indeed look like what I am after, assuming 'stats count by program' will return
progA=2
progB=1
progC=1
Does this seem right to you? If so case closed, I guess!
Yes, the stats count by Program will give you the counts you want
Thank you very much for your help! I feel like my understanding has deepened a bit.