Hi All,
I am trying to build a use case with the below scenarios:
1) Person A can do tasks X and Y but not task Z or,
2) Person A can do tasks Y and Z but not task X or,
3) Person A can either do task X or task Y or task Z
At no given point is Person A allowed to conduct all three tasks and at no given point should the IP addresses of Person A and Person B be the same.
Information is being picked from two separate source types from the same index.
The challenge is picking the IP address when Person A does tasks X and Y and Person B does task Z, how can I get the IP addresses of both Person A and Person B, so as to compare and make sure that they are two different IP addresses, keeping in mind that the IP address is only available in one source type and not the other?
Any assistance on this would be appreciated ...
Again, you mentioned "at the same time" without defining a method to determine this "time" in which to perform calculations. This determination is key to make the problem well defined.
Can you post output of at least two transactions from the following?
index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
| table sourcetype _time transactionID USERID_X USERID_Y USERID_Z
You can anonymize as much as needed, but time sequence must be as realistically as is allowed. In particular, I want to know whether transactions overlap in time.
I have run what you had requested for and the below are the sample results:
SOURCETYPE | _TIME | TRANSACTION_ID | USERID_X | USERID_Y | USERID_Z |
sourcetypeB | 2022-08-20 12:08:12 | 012434 | PAMELA | AMY | AMY |
sourcetypeB | 2022-08-20 12:08:12 | 012434 | PAMELA | AMY | AMY |
sourcetypeB | 2022-08-20 13:16:10 | 013435 | DIANA | ARTHUR | ARTHUR |
sourcetypeB | 2022-08-20 13:16:10 | 013435 | DIANA | ARTHUR | ARTHUR |
In this set of output, there is no sourcetypeA. Because you indicated that TRANSACTION_ID was only in one sourcetype and is not in the same sourcetype that contained IP, this means that your search returned no IP address. I think it is safe to say that there is no way to display IP in any format. Is this correct?
You need to extend your search to include at least two complete transactions including sourcetypeA that you believe belongs to these two transactions. (That will give clues to what criteria are potentially used to determine which data belongs to which transaction. It is not a perfect way to get those criteria but if you cannot describe it directly, it is worth a try.)
I have re-run the query, see below for sample results:
SOURCETYPE | _TIME | USER_ID | DEVICE_ID | TRANSACTION_ID | USERID_X | USERID_Y | USERID_Z |
sourcetypeA | 2022-08-24 08:14:56 | DANIEL | 10.10.10.1 | ||||
sourcetypeA | 2022-08-24 08:12:27 | DAVID | 10.10.50.2 | ||||
sourcetypeB | 2022-08-23 16:59:02 | 017235 | PAMELA | PAMELA | AMY | ||
sourcetypeB | 2022-08-23 17:28:07 | 017550 | JOHN | STANLEY | STANLEY |
I have re-run the query, see below for sample results:
SOURCETYPE | _TIME | USER_ID | DEVICE_ID | TRANSACTION_ID | USERID_X | USERID_Y | USERID_Z |
sourcetypeA | 2022-08-24 08:14:56 | DANIEL | 10.10.10.1 | ||||
sourcetypeA | 2022-08-24 08:12:27 | DAVID | 10.10.50.2 | ||||
sourcetypeB | 2022-08-23 16:59:02 | 017235 | PAMELA | PAMELA | AMY | ||
sourcetypeB | 2022-08-23 17:28:07 | 017550 | JOHN | STANLEY | STANLEY |
So, each transaction is reported in a single event in sourcetypeB. Now, can you run the same query, maybe using some larger time range, as to reveal IP addresses of JOHN, STANLEY, PAMELA, and AMY? What is really important is to understand WHEN these IP addresses is revealed because you indicate that there is no other relationship between sourcetypeA and sourcetypeB. Because DANIEL and DAVID are not involved in any of the illustrated transactions, those values are of no use to your task.
Another possibility is that in the entire sourcetypeA, each of these users can only appear once. Can this be true? That seems contradict to possibilities you laid out in the question itself.
Additionally, can you explain the listing header USERID (not USERID_X or _Y or _Z) corresponding to sourcetypeA? Using "| table sourcetype _time transactionID USERID_X USERID_Y USERID_Z", it is not possible to have a field named USERID.
USERID and IP Adderess belongs to sourcetype A
USERID_X, USERID_Y, USERID_Z and TransactionID belong to sourcetype B.
USERID and USERID_X, Y and Z is what is common between both sourcetypes.
USERID_X, Y and Z depict Makers, Checkers and Verifiers where for example Pamela can be a maker and a checker but not verifier i.e UserID_X and Y or she can be a checker and a verifier but not a maker i.e USERID_Y and Z or she can be a maker and a verifier but not a checker i.e USERID_X and Z.
What is happening with my query currently, is that it is picking the users correctly, it is also picking IP addresses but it picks IP address for users that are the same.
So if Pamela is a maker and a checker, it will pick her IP and Arthur is the verifier but it instead of picking Arthur's IP it will pick Pamela's IP address because we have used FOREACH and grouped by USER in the query.
Is there a way in which I can give the conditions and say that where Splunk sees USERID_X and USERID_Y is the same, it should pick it as one user and give me the IP address and and it should also display USERID_Z and give me the IP address?
I still cannot wrap my mind around the requirement because I cannot see where and (most importantly) when the IP addresses of JOHN, STANLEY, PAMELA, and AMY are captured. Is there some way to show that? What is the nature of mysourcetypeA? Is it a real time series or is it just a lookup of permanent or current addresses? (If mysroucetypeA is permanent, your problem can be solved with no need for mysourcetypeB; if it only contains current addresses, your problem has no dynamic solution, and again, information is mysourcetypeB is irrelevant.)
USERID and IP Adderess belongs to sourcetype A
This is another long-lost piece of the puzzle in previous back-and-forths. You did mention a USERID_A in one earlier reply but all later discussion implied that USERID_X, _Y, _Z were all there were. (I understand your need to obfuscate data. But whenever you use anonymized field names, consistence is super important. Field mapping must be one-to-one and cannot change.)
Yet another missing piece of the puzzle was also revealed very late, that USERID_X, _Y, _Z all appeared in each and same event of sourcetypeB. (This is a really strange way to record a multi-party transaction.) The foreach command is devised based on the following assumption that I explicitly stated: "This assumes that different field names do not appear in the same event."
As mentioned above, there is still one missing piece of information: how is "given point" or "at the same time" can be calculated in mysourcetypeA and mysourcetypeB, respectively? Without this information, the problem remains ill defined.
Without this most important missing piece, you can try the following blindly:
index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
| eval USER = if(sourcetype == "mysourcetypeA", USERID, mvappend(USERID_X, USERID_Y, USERID_Z))
| stats values(IP) as IP values(transactionID) as transactionID by USER
| stats values(USER) as USER by IP transactionID
``` | where mvcount(USER) > 1 ``` ``` this would be the alert ```
| eval users_per_IP = mvcount(USER)
| sort - user_per_IP
The above incorporates the two missing pieces that I just pieced together. But it will not give any better result than the following search using mysourcetypeA only:
index=myindex (sourcetype="mysourcetypeA")
| stats values(USERID) as USER dc(USERID) as user_per_IP by IP
| sort - user_per_IP
index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB) | foreach USERID_X USERID_Y USERID_Z [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>) | eval "IP OF <<FIELD>>" = IP] | stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID
This is the only query which is close to what I require.
I only need it to pick the IP address of the user where the USERID_X is not the same as USERID_Y and USERID_Z
As I preambled when introducing that foreach iteration, it is based on the assumption that USERID_X, _Y, _Z, etc., do not appear in the same event. Otherwise USER will only take on the value of USERID_Z. As enticing as it is, this is not what you wanted.
In addition, the "IP of <<FIELD>>" assignment is also meaningless in your data set because IP only exists where none of USERID_X, _Y, or _Z exists. that entire line can only return the last (earliest) IP address in the entire mysourcetypeA. That is also not what you wanted.
From my observation, the purpose of showing "IP of xxx" is just to visually validate alert output. But there are a million ways to validate output without involving such laborious tactic. This said, you can try
index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
| eval USER = if(sourcetype == "mysourcetypeA", USERID, mvappend(USERID_X, USERID_Y, USERID_Z))
| eventstats values(IP) as IP values(transactionID) as transactionID by USER
| mvexpand USER
| foreach USERID_X USERID_Y USERID_Z
[ eval "IP of <<FIELD>>" = if(USER == <<FIELD>>, IP, 'IP of <<FIELD>>') ]
| stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID
@yuanliu, or anybody else, please assist.
How can I display IP addresses for the users?
I need results to be close to this format:
index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB) | foreach USERID_X USERID_Y USERID_Z [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>) | eval "IP OF <<FIELD>>" = IP] | stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID
But I need information be picked correctly for each user.
Correct me if I am wrong. When we use the command | foreach USER_ID, USERID_X, USERID_Y, USERID_Z are we trying to combine the information for all these into 1 USER?
Something I've realized is that the information can only be the same for:
USER, USERID_X and USERID_Y or
USER, USERID_Y and USERID_Z or
USER, USERID_X and USERID_Z
At no given point are USER, X, Y, Z the same at the same time.
Maybe that's why the IP address is only picked for the user where the information is the same and leaves the IP address of the one where the user is different?
gcusello's code should result in a list like the following
person | ip_count | ip_address | task |
Person A | 2 | 10.101.8.1 10.101.9.3 | X Y |
Person B | 3 | 10.101.7.105 10.101.8.1 10.101.8.12 | Z |
If in your data you only see one line, check your search time period to make sure both Person A and Person B have activities.
The code does not directly address the part of requirement about At no given point. You may need a time bucket variable to flesh that out. In physics, there is no such a thing as a point in time, so I assume that you meant "during no given time interval" (of a predefined length). In general you don't want to deal with a rolling time interval in Splunk, so choose a meaningful span, say, 1h, and use gcusello's formula with this modifier
index=your_index (sourcetype=sourcetypeA OR sourcetype=sourcetypeB)
| bin span=1h _time
| stats dc(ip_address) AS ip_count values(ip_address) AS ip_address values(task) AS task BY _time person
The span can be any value that is a common multiple of your data collection intervals.
Hi @Splunk_Master01,
maybe there's an error in the question: you sayed:
3) Person A can either do task X or task Y or task Z
At no given point is Person A allowed to conduct all three tasks
then you don't give any condition for PersonB.
Then some additional information:
Ciao.
Giuseppe