My problem is: I have an id that should be logged from two systems if integration between these two have worked as supposed to. And this id will be logged with different names from these two systems, but should be the same(and unique of course) across these systems.
event1: ....systemA_dokumentID=22
event2: ....systemA_dokumentID=23
event3: ....systemB_dokumentID=22
I would then like to do a search that returns a list of ids from systemA that are not found in systemB. The example above should then return 23 in its list.
There is a large amount of data involved here - up to several millions.
My 2 questions are then:
Is this possible?
If it is possible, is it a good idea to do this in splunk (it's not critical for the search to be very fast)?
If you know beforehand what the fieldnames are, you could just rename them to use some common field name and then run transaction
on it, and check which transactions have less than 2 events in them in order to find failures.
... | rename systemA_dokumentID as dokumentID | rename systemB_dokumentID as dokumentID | transaction dokumentID | search eventcount<2
if you have a generic field that matches systemA_dokumentID or systemB_dokumentID as "dokumentID", you could also use a subsearch. You can then use an event type to label
(eventtype="systemA" NOT [ search eventtype="systemB" | dedup dokumentID | fields + dokumentID ]) OR (eventtype="systemB" NOT [ search eventtype="systemA" | dedup dokumentID | fields + dokumentID ]) | eval category=if(eventtype="systemA","In A, not in B", "In B, not in A") | table category dokumentID
though i think splunk will have problems if the subsearch returns a result that is too large, and splunk's performance is not great with sub searches.
expanding on Ayn's response though (which is a better approach than sub search IMO), you could use mv* functions in eval to determine which system it is in (i am using 2 evals to keep individual system a / b dokumentID's while copying the value to a generic dokumentID, instead of renaming them. this provides a common field and a differentiating field)
... | eval dokumentID=systemA_dokumentID | eval dokumentID=systemB_dokumentID | transaction dokumentID | makemv dokumentID | eval docNotInA=if(isnull(mvcount(systemA_dokumentID)),dokumentID,NULL) | eval docNotInB=if(isnull(mvcount(systemB_dokumentID)),dokumentID,NULL) | search docNotInA=* OR docNotInB=*
you may have to use mvindex(dokumentID,0)
instead of dokumentID
in the 3rd and 4th eval statements to get a single value list of ID's. this method will also allow multiple events in systemA to be correctly categorized. you can also remove the need for the first 2 evals by defining a field extractor for systemXdokumentID (that matches any system) and 2 field extractors for systemAdokumentID and systemBdokumentID
If you know beforehand what the fieldnames are, you could just rename them to use some common field name and then run transaction
on it, and check which transactions have less than 2 events in them in order to find failures.
... | rename systemA_dokumentID as dokumentID | rename systemB_dokumentID as dokumentID | transaction dokumentID | search eventcount<2
Well you might have some other way of identifying which system an event came from perhaps - like the source field or something else. You could just check which values you got for source in the transaction and deduce from that which system you're missing things from.
Thank you for your answer. One thing remains though, I would like to know, by the result, which id is missing in what system - not only that it's missing in one of them.