How do I properly count fields which repeat in each event?
Here is a sample event:
2885136 2010-10-04 03:55:35.997 1 Run SqlCommand [GTRawRhoSelect] of Command Type [StoredProcedure]. Command parameters: @RICPairs : type(SqlString) : [ <RICPairs xmlns=""><RICPair RIC1="ABC" RIC2="DEF" /><RICPair RIC1="ABC" RIC2="GHI" /><RICPair RIC1="GHI" RIC2="JK.LM" /><RICPair RIC1="JK.LM" RIC2="OPQ" /></RICPairs> ]; @ValueTypeParam : type(SqlDouble) : [ Null ];
I want to identify all the distinct RICPair 's in all the logs, so above, this would be:
ABC-DEF ABC-GHI GHI-JK.LM JK.LM-OPQ
and I want to show a count that each distinct pair appears. Right now, if I use the default fields that are created (eg. RIC1), it only shows the FIRST occurence in each event (so here it detects ABC as a RIC1, but not GHI).
How do I get it to count properly?
Splunk can handled multi-value fields, for example
RIC1=[ABC, ABC, GHI, JK.LM] and
RIC2=[DEF, GHI, JK.LM, OPQ]. But the problem here is that you want to handle two sets of multivalue fields as pairs of values, which splunk doesn't really handle out of the box. So we have to look for an option other than standard mulit-value field extraction.
So one really easy and somewhat lazy approach would be to just capture a new field that contains both values, and count against that field.
First, this search should extract a multi-value field called "pair": (You should see multiple entires in the "pair" column)
... | rex "<RICPair\s+(?<pair>.+?)\s*/>" max_match=100 | table _time, pair
Now that we have a list (multi-value field) containing the pair string, it should be possible to get your distinct count:
... | rex "<RICPair\s+(?<pair>.+?)\s*/>" max_match=100 | stats dc by pair
After we have the stats like this, we can break the "pair" field apart into your two separate fields, like this:
... | rex "<RICPair\s+(?<pair>.+?)\s*/>" max_match=100 | stats dc by pair | rex field=pair "RIC1=\"(?<RIC1>.*?)\" RIC2=\"(?<RIC2>.*?)\""
Does that help at all?
Nope, I have actually tried something similar before already, but the problem with this method is it only returns the first pair (so in the above example it would only detect the ABC-DEF pair.
I have in another instance been able to detect 2 pairs in a single event, but that event had something like 30 pairs. In most other instances I am only able to detect the first pair.
I think the problem is that there could be infinite numbers of RICPairs in each event - is there another way to count it?
That shouldn't be a problem in the example given. The
max_match=100 should allow the regex to match multiple times and return a multi-value field called "pair" will all of the matching fields. I suppose you could try adding a
mvexpand before the
stats, that shouldn't be necessary, but you could try it.
Ok I found out more....I think I might have a memory or other environment issue.
When I retried this morning, I was actually able to see a table of multiple pair entries for each event (using the | table _time, pair command). However, on a second try, I could no longer see any pairs. And when I did a count by pair, it was only detecting the first pair again.
I restarted splunk and cleaned the index, then tried again, and this time splunkd crashed while running the first command above - but I can clearly see that multiple pairs came up per event in the results.
Before I restarted I also sa
I've been trying again, this time running fewer programs on my CPU, and it runs for a long time, with some results, but it does not finish.
I get the error: The running job "1286887054.102" was canceled remotely or expired.
I have set max_match=20000, because that is what my data logs are like: there may be up to 20000 pairs in one event.